Linux kernel 2.6.24-12 lockup

Bug #204996 reported by Elod VALKAI
178
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
High
Unassigned
Nominated for Hardy by Julian Alarcon
Intrepid
Invalid
High
Unassigned

Bug Description

Binary package hint: linux-image-2.6.24-12-generic

I was upgrading from gutsy (with Linux 2.6.22-14) to the latest alpha last sunday (16.03.2008), and I've got some problems with the kernel.

The 2.6.24-12-generic (I think, may be -386) causes my machine to lock up (hard) after about 5 minutes. Generally I was under X, there is no specific program I was using at the time it locked up.

The hardware is completely stable & has been for the last 3 years with ubuntu, and still is with hardy & the gutsy kernel.

Another thing I noticed is that something in the initrd keeps the machine from booting for at least 2-3 minutes. It's definetely before running scripts in the /etc/rcS.d folder, I have not traced it in the initrd. After booting all is well until it locks up.

The hardware is a Dell C400, Intel chipset, Intel graphics (i830), 384MB of RAM and an atheros wireless card.

lsb_release:
Description: Ubuntu hardy (development branch)
Release: 8.04

It's completely reproductible, the freeze takes out the whole kernel (no reply to ping from network). Any suggestions on how to trace it? I have a serial port, no parallel.

Revision history for this message
Elod VALKAI (elod) wrote :

Freezes do not happen with the latest beta (as of 24.03.2008), but the initrd still keeps me waiting at startup.

Revision history for this message
BobPendleton (bob-pendleton) wrote :

I have the same hardware but with 512MB. I have the same problem. The problem occured with alpha 4, alpha 5, and beta 1. After alpha 5 I did a complete reinstall of 7.10, which works perfectly, and then did a new install of beta 1 using update-manager -d.

It still locks up hard after 5 to 10 minutes. I'm usually in Firefox when it locks up but I have had it lock up in other applications including synaptic and xemacs. Last time it locked up was this afternoon after doing a complete apt-get update/apt-get dist-upgrade, so the software was at the latest patch level as of this afternoon (US central standard time).

Bob Pendleton

Revision history for this message
Elod VALKAI (elod) wrote :

I've not tested it for very long. I'll get back to it when I've got time. Still running 2.4.22

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Guys,

Per the kernel team's bug policy, can you please attach the following information. Please be sure to attach each file as a separate attachment.

* cat /proc/version_signature > version.log
* dmesg > dmesg.log
* sudo lspci -vvnn > lspci-vvnn.log

For more information regarding the kernel team bug policy, please refer to https://wiki.ubuntu.com/KernelTeamBugPolicies . Also, the following wiki may help to gather additional useful logs to help debug: https://help.ubuntu.com/community/DebuggingSystemCrash . Thanks again and we appreciate your help and feedback.

Changed in linux:
status: New → Incomplete
Revision history for this message
Elod VALKAI (elod) wrote :

I'm submitting the requested information about my system. It surely does lock up with the current kernel (just did, not after minutes, but rather 2-3 hours).

I've also noted the lines output by initrd.img at boot:

Begin: /scripts/local-premount
/script/local-premount/resume: /script/local-premount/resume: 57: log_begin_msg: not found
--> at this point booting hangs for aprox 2-3 minutes, and resumes normally after
/script/local-premount/resume: /script/local-premount/resume: 57: log_end_msg: not found

Haven't looked inside the initrd, is this something related to suspend? I don't have a swap partition, only a swap file
that's hosted on on /dev/sda3 (FAT32). Swap is not related to the lockups, it always happened with 1 application loaded or in gdm.

I'm attaching the requested files.

Revision history for this message
Elod VALKAI (elod) wrote :
Revision history for this message
Elod VALKAI (elod) wrote :
Revision history for this message
Elod VALKAI (elod) wrote :

I'd also like to note that the machine has the latest BIOS update from Dell (A12), and is working perfectly with linux-image 2.6.22-14.52 & the rest of hardy.

Revision history for this message
Elod VALKAI (elod) wrote :

It just produced a nice lockup right after my last post, so my previus post with hangs after 2-3 hours is not entirely accurate.

I do know this is rather hard to trace, but I'm willing to take the time.

Revision history for this message
Wolfgang Glas (wglas) wrote :

Same problem here, I'm using hardy beta-1 x86_64, Hardware is a Dell Latitude D830 with 2GB of RAM. Lockup occurred while using firefox3.

Revision history for this message
Elod VALKAI (elod) wrote :

Wolfgang, please post the information requested above by Leann:

* cat /proc/version_signature > version.log
* dmesg > dmesg.log
* sudo lspci -vvnn > lspci-vvnn.log

Danke schön! :)

Revision history for this message
Wolfgang Glas (wglas) wrote :
Revision history for this message
Wolfgang Glas (wglas) wrote :
Revision history for this message
Wolfgang Glas (wglas) wrote :
Revision history for this message
Wolfgang Glas (wglas) wrote :

Setting to confirmed, because I attached all required informations and multiple users observe this bug.

Generally, we've been observing this kind of lockup in each ubuntu kernel starting with feisty's 2.6.20. In fact, this kind of kernel lockup is the reason why we are operating debian etch on our servers in favor of an ubuntu distribution.

Changed in linux:
status: Incomplete → Confirmed
Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: Confirmed → Triaged
Revision history for this message
Jim March (1-jim-march) wrote :

Yeah, same thing here, hard lockups every two to four hours. Thing goes totally dead.

Hardware: Acer 3680 laptop, Celeron single-core 1.6gHz, 533 memory bus, Atheros WiFi, 80gig SATA hard disk, 1.5gigs memory, Intel945 video, Intel sound, Marvell Ethernet chipset. This thing ran great under Gutsy and Feisty, no signs of overheating. I briefly switched back to Gutsy and all lockups vanished. My usually Internet connection is a Verizon EVDO cellmodem, Kyocera KPC650 PCMCIA set up as a PPP device, but I'm also crashing with it pulled so that's not it.

Software: pretty basic Hardy setup, also VirtualBox 1.5.6. Most crashes have happened without running a guest OS (XP).

Things I've tried: disabling Compiz, disabling Powernowd, pulling the Atheros miniPCI card completely, switching from Network Manager to Wicd, disabling Tracker completely, various others. Nothing has helped.

Version.log file (as described above) contains:

Ubuntu 2.6.24-16.30-generic

dmesg.log file is attached to this post, the last log will go in next.

Revision history for this message
Jim March (1-jim-march) wrote :

Final log...

Revision history for this message
Anthony Chianese (achianese) wrote :

I'm having a problem that may be the same as this, also using the latest Hardy beta. I'm using network-manager, which automatically detected my Linksys WMP54G wireless PCI card as "RaLink RT2561/RT61 802.11g PCI". It works very well until the crash, which causes the system to become completely unresponsive (no keyboard or mouse response, Capslock and Scroll lock blink, nothing is added to syslog at the time of crash).

I can reproduce the crash easily by asking rsync to sync a directory across my local network.. After 2-3 minutes, it crashes. I'm willing to provide any additional information someone wants, but I'm not sure right how to get more information about the cause. The system has also crashed while I was surfing the web using firefox or downloading a large number of emails in thunderbird, always while the network is transferring data.

Attachments coming..

Revision history for this message
Anthony Chianese (achianese) wrote :
Revision history for this message
Anthony Chianese (achianese) wrote :
Revision history for this message
martyg (snowbird) wrote :

I think I am seeing the same issue on my little server.

Screen goes blank. No mouse response.
Can''t switch VC's, CTL-ALT-BS, or CTL-ALT-DEL.
Not accessible from network either.

Next time it happens, I'll try doing a ALT-SysRq-E and dig around.

No idea how to reproduce - Just happens ranodomly.
No obvious artifacts in logs I can send the maintainers.

This machine has been 100% stable on Debian for past 3 years! (Running 24/7)
Hardware is AMD64 barebone desktop with cheapo ATI graphics.

I am enclosing my system description as requested in an earlier post by the kernel maintainers.

Revision history for this message
martyg (snowbird) wrote :

Following Anthony's guidance, I have been able to reproduce once in just a few minutes by loading up the network as much as I can.
Regrettably, haven't been able to do it again after trying for about 30 minutes.
But, at this point, I am convinced this issue is network traffic related. This is consistent with my other failures earlier today.

No response to ALT-SysRq-E after the failure. Need to power cycle to continue.
After rebooting, nothing in the logs.

Note I have observed this issue on both NICs I have installed on this system.
(After getting a couple of these, I rolled my connection over to the other NIC and have been seeing it there too)

Revision history for this message
reh4c (gene-hoffler) wrote :

I have experienced this issue (or one with the same traits) when using synaptic to perform upgrades. My laptop typically freezes when installing packages...screen doesn't go blank, but the whole system is frozen. Already filed another bug and posted the log files there.

Revision history for this message
Jim March (1-jim-march) wrote :

The common denominator is heavy CPU activity. I can't see ANY other pattern here, guys, unless it's some very obscure chipset component. Even then, some are running ATI CPUs which likely use damn few similar "guts" parts with my Intel-based lappy fr'instance...

Video chips, WiFi chips, we're all over the friggin' map here. Which makes me think this one will be a royal bear to sort out.

Revision history for this message
Anthony Chianese (achianese) wrote :

I've been playing for a day now, and I think my lockup is directly related to wireless. Switching from network-manager to wicd didn't fix it, but switching from wireless to wired (either network-manager or wicd) seems to fix it (stable for a day, constantly maxing out my local bandwidth using rsync). If I go back to wireless, I can lock it up in a few mins as before.

Skimming your logs Marty, it looks like you're only using wired ethernet, correct?

I'll see if ndiswrapper does anything differently.

Revision history for this message
Wolfgang Glas (wglas) wrote :

AFAIK the lockup occurs on server hardware, too. In my case, I used a PCMCIA UMTS card (Sierra wireless AirCard 875).

My overall impression is, that it might be related to the payload wireless devices put on the TCP/IP-stack. Typically, wireless devices stress TCP congestion handling far more than wired connections do. And yes, in my case the lockup may be provoked by a combination of CPU load plus a fair amount of open TCP/IP connections.

Revision history for this message
martyg (snowbird) wrote : Re: Linux kernel 2.6.24-16 lockup

There are no WiFi adaptors in my system. I have observed this bug on both my hardwired 100/1000 NICs.

Note all my failures have been on 2.6.24-16. (Changed title) I did not run the -14 kernel.

I tried reproducing yesterday by pounding on the CPU with infinite make -j4 kernel compiles.
Both cores in my CPU were saturated for 3+ hours. No failures during this test.
Crashed once during the 24 hours, but I still can't figure out a good way to reproduce.

My best guess is this is triggered by an unusual condition in the networking stack, e.g. segmentation, checksum, or MTU misalignment.

Has anyone tried seeing if a serial console still responds after failure?

Revision history for this message
PeteDonnell (pete-donnell-deactivatedaccount) wrote :

I'm experiencing the same hard freeze with kernel 2.6.24-16. No wireless here, just onboard ethernet. The same system worked just fine under Gutsy, the problem's only started since upgrading to Hardy. My graphics card is an old S3 Virge. Crashes usually appear when watching Youtube in Firefox, but also have occured when watching videos from a locally attached USB drive. System appears stable under light load though - I stupidly uninstalled the Gutsy kernel not long after the upgrade, before I became aware of this problem, so I haven't been able to test the system with the old kernel. The computer is totally unresponsive, the pointer doesn't move etc, and appears offline from when scanned by other computers on the network.

The weirdest thing (to me at least) is that after the computer has been power cycled (the only way to get it to respond again) the BIOS sequence hangs. The hard drive light goes on and stays on, but the BIOS doesn't go on to detect the hard drives. The only way to get round this appears to be to hold down the power button for five seconds until the machine shuts down, then leave it for 10 minutes or so, and then turn it back on. After this it appears to behave normally, until the next freeze.

There's a post on the forums that appears to be describing the same problem: http://ubuntuforums.org/showthread.php?t=725669
As suggested on there I tried booting with 'noapic nolapic irqpoll noirqdebug', but this didn't help. I don't have a great deal of time to troubleshoot, but would like to help in sorting this out if there's any other information I can provide.

Revision history for this message
PeteDonnell (pete-donnell-deactivatedaccount) wrote :
Revision history for this message
PeteDonnell (pete-donnell-deactivatedaccount) wrote :
Revision history for this message
Nicholas (drkoljan) wrote :

I would like to add my two cents here, as the Ubuntu 8.04 64-bit has locked up for me twice already. I'm not sure if it has anything to do with me overclocking my CPU, though (no problems under Windows, temperature <45C).

Revision history for this message
BobPendleton (bob-pendleton) wrote : Re: [Bug 204996] Re: Linux kernel 2.6.24-12 lockup
  • unnamed Edit (2.8 KiB, text/html; charset=ISO-8859-1)

Folks, this lock up appears to be affecting a lot of people who are trying
prerelease versions of hardy, if this thing goes public with this bug it is
going to be a *disaster* for Ubuntu. Is there anyway to get hold of the
people responsible and get the release delayed until this bug is fixed?
Ubuntu, not to mention Linux, can not afford this kind of a black eye.

Bob Pendleton

On Mon, Apr 21, 2008 at 12:05 PM, Nicholas <email address hidden> wrote:

> I would like to add my two cents here, as the Ubuntu 8.04 64-bit has
> locked up for me twice already. I'm not sure if it has anything to do
> with me overclocking my CPU, though (no problems under Windows,
> temperature <45C).
>
> --
> Linux kernel 2.6.24-12 lockup
> https://bugs.launchpad.net/bugs/204996
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Source Package "linux" in Ubuntu: Triaged
>
> Bug description:
> Binary package hint: linux-image-2.6.24-12-generic
>
> I was upgrading from gutsy (with Linux 2.6.22-14) to the latest alpha last
> sunday (16.03.2008), and I've got some problems with the kernel.
>
> The 2.6.24-12-generic (I think, may be -386) causes my machine to lock up
> (hard) after about 5 minutes. Generally I was under X, there is no specific
> program I was using at the time it locked up.
>
> The hardware is completely stable & has been for the last 3 years with
> ubuntu, and still is with hardy & the gutsy kernel.
>
> Another thing I noticed is that something in the initrd keeps the machine
> from booting for at least 2-3 minutes. It's definetely before running
> scripts in the /etc/rcS.d folder, I have not traced it in the initrd. After
> booting all is well until it locks up.
>
> The hardware is a Dell C400, Intel chipset, Intel graphics (i830), 384MB
> of RAM and an atheros wireless card.
>
> lsb_release:
> Description: Ubuntu hardy (development branch)
> Release: 8.04
>
> It's completely reproductible, the freeze takes out the whole kernel (no
> reply to ping from network). Any suggestions on how to trace it? I have a
> serial port, no parallel.
>

--

+ Bob Pendleton: writer and programmer
+ email: <email address hidden>
+ web: www.GameProgrammer.com
+ www.Wise2Food.com

+--------------------------------------+

Revision history for this message
Wolfgang Glas (wglas) wrote :

Hi Bob,

 Keep your hair on ;-) We've never seen a release of a "stable" Linux distribution, which became actually stable in the sense of a user after at least 2 months of post-release bug fixing. Unfortunately, both users and developers start seriously working on bug reports in the face of a release deadline. So, the best thing we can do is to support the mighty kernel developers by supplying them with the information they need.

Revision history for this message
drivinghome (drivinghome) wrote :

i too get this sort of lock ups, but *only* when i use the wired connection.
It didn't occur to me until recently as i mostly use the wireless card.

#lspci|grep Ethernet
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller

The module i use is e1000.

The lockups appear after ~2, 3 hours of wired connection.

Revision history for this message
martyg (snowbird) wrote :

1. Has anyone ssen this happen on the SERVER build of kernels, or is this just happening on the GENERIC build?

2. Has anyone tried to hook up a serial console and see if they can still talk to their box after it crashes?

Revision history for this message
tjutberg (burzmon) wrote :

I too get random lockups, haven't found any special trigger so far but happens more often with heavy CPU-activity.

Im running hardy amd 64 with nvidia graphics and it worked just fine under gutsy.

Revision history for this message
tjutberg (burzmon) wrote :
Revision history for this message
tjutberg (burzmon) wrote :
Revision history for this message
Christoph Lechleitner (lech) wrote :

>1. Has anyone ssen this happen on the SERVER build of kernels, or is this just happening on the GENERIC build?

I have seen this kind of crash with feisty's server kernel on my IBM x335 servers on a regular basis.
I would not be surprised if the initial cause(s) are in since 2.6.20 (which seem to have started the deep fall of kernel stability).
My problems vanished when I stepped back to edgy, in the meantime I have switched the host system to etch, while keeping the OpenVZ guests on feisty/gutsy.

I get the impression that the problem only occurs on SMP systems, and my systems crashed when LAN (wired GBit) and CPU and HD were under some load, e.g. 2 rsyncs over SSH over GBit. HD and LAN IO produce a large number of interrupts which seem not to be trivial to handle on a multicore system.

Revision history for this message
Elod VALKAI (elod) wrote :

I'll try the serial console, but it seems to be completely frozen.

I've never seen a distro (some slackware, more redhat, debian's woody, sarge, etch & lenny) behave like this after release. I've upgraded packages the night before (23th of april), there is no new kernel.

Revision history for this message
PeteDonnell (pete-donnell-deactivatedaccount) wrote :

In response to martyg's question about server kernels: it occured to me too that the problem might be specific to the generic kernel, so a few days ago I tried booting the server kernel instead. It crashed in exactly the same way as the generic kernel though. I should have mentioned it before...

Revision history for this message
martyg (snowbird) wrote :

I can also confirm reproducing this issue on linux-image-2.6.24-16-SERVER

Revision history for this message
martyg (snowbird) wrote :

I scrounged up a cable and hooked up the serial port on my machine today.
Reproduced once again on linux-image-2.6.24-16-generic.

Serial console totally dead after the crash. Nothing to see here either.

Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :

Confirmed on Acer Aspire 5022 laptop. Attached piece of my kernel log. These messages appear before lock up.

Revision history for this message
Jim March (1-jim-march) wrote :

I just did a clean install from the actual release of Hardy. No change. I installed the server kernel:

sudo apt-get install linux-server

THAT didn't help either and introduced numerous glitches.

This is by far, no question, THE WORST UBUNTU EVER.

Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :

Information about Acer 5022 suffering this bug.

Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :
Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :

I am using Gutsy kernel as workaround for now. It has it own problems with Hardy, but it doesn't freeze.

Revision history for this message
Jim March (1-jim-march) wrote :

Well the good news is that the -rt kernel right out of the Hardy repos seems to be holding. I'll report back after some hours more use but right now, I'd have expected it to blow up by now.

I think there's maybe a 5% to 10% video performance hit in GLXgears with the -rt kernel, but I don't notice any speed issues. Compiz and WiFi are working, as is flash, audio...basically, if this -rt kernel holds, I'm going to recommend it as a possible cure in the "something to try" category. Expect another post here on my progress late tonight (Saturday, California time...).

Revision history for this message
Invader Amoto (invaderamoto) wrote :

I think I have the same problem. For me it doesn't seem to be related to anything, although i usually have firefox open when it happens. That and tvtime. Sometimes it will completely lock up and if i dont click anything or touch the keyboard, it will come back in a few minutes. But once it comes back, if i do literally anything that would make it read the hard drive, it freezes permanently. I can use apps that are already open as long as they don't have to read the hard drive. I've tried alt crtl backspace, alt ctrl f1, and even alt sysreq R-E-I-S-U-B, at both lock ups. If i do it with the first lock up, it makes things worse and never comes back. If i do it with the second lock up, it either does nothing or it cause everything to freeze permanently. Just from it happening to me about 30 times (I've tried various things to fix it), I think it has to do with not being able to read the hard disk, or the kernel decides that it doesn't want to read the disk anymore. Also, sometimes it will come back after the first lock up and i try to open an app, an error message pops up saying something about being unable to load app because of an I/O error. But then it usually permenantly locks up after that. This problem has made me attempt to go back to 7.10 (although i'm having huge problems with grub). I really hope this bug gets fixed (even if it's not the same as mine).

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Ok, I'm back with slightly more info. When I was typing my last comment, it did the first freeze I explained in that comment, then everthing came back and i posted it. Then I decided to try shuting down the comp the way your supposed to (click the quit icon on the panel, then choose shutdown). I've tried this in the past and it just makes it freeze, but this time i waited like 5 mins after pressing shutdown and when i came back i saw this on the screen (which is black and white, all text, like when u do alt ctrl F1):
kinit: name_to_dev_t( then in here it showed the address to my hard drive, something like /dev/disk/ then a bunch of numbers, i think) = db2 (8,18)
kinit: trying to resume from then here it showed the same address to my disk as above
kinit: no resume image doing normal boot
Ubuntu 8.04 john-desktop tty1

OK, thats not exactly what it said (I just scribbled it on a piece of paper really quickly) but it was something along those lines. Then after it said the "Ubuntu 8.04 john-desktop tty1" line it said something like john-desktop login, but no matter what i typed it would show the "Ubuntu 8.04 john-desktop tty1" line. This supports my idea that the problem is related to not being able to read the disk, because all through that, my comp's disk activity light didn't go on, and i assume that it went back to the "Ubuntu 8.04 joh......" line because the computer couldn't check the disk to see if i typed the correct login. I noticed that some people with this bug or a similar one claim it's related to network activity, so my bug may be completely different than the one reported here.

Revision history for this message
Jim March (1-jim-march) wrote :

I think you're on to something, that it's disk-access related somehow.

There's a few reports of screwed-up swapfile access from some of our brethren afflicted :). That may be tied to how much memory we have - I'm running 1.5gig and hence I'm not hitting swap. But if I was, screwball disk access could show up as a swapfile issue.

Huh.

In better news, I've been throwing more and more stuff at mine trying to get it to puke under the -rt kernel. No "luck" so far, which is good...this may be a sound alternative to a kernel recompile.

The pieces I added in Synaptic were:

linux-headers-2.6.24-16-rt (needed if you're a Virtualbox user or otherwise have to compile kernel modules!)

linux-headers-rt (possibly same boat as above)

linux-image-2.6.24-16-rt

linux-image-rt

linux-restricted-modules-2.6.24-16-rt

linux-restricted-modules-rt

I also edited grub:

sudo gedit /boot/grub/menu.lst

In there, I recommending making a couple of changes, as so:

---
## timeout sec
# Set a timeout, in SEC seconds, before automatically booting the default entry
# (normally the first entry defined).
timeout 20

## hiddenmenu
# Hides the menu by default (press ESC to see the menu)
#hiddenmenu
---

Commenting "hiddenmenu" (default is active) makes your system pick from the kernel list on each boot, while increasing the "timeout" number gives you more time to pick.

This is just while you're testing various kernels. I have server, rt, 386 and generic. Server so far has been a failure, haven't played with "386" but I don't have high hopes. It's -rt that's the big change that with luck isn't going to crash on me...I've got about four hours use on it so far. Fingers crossed it stays blow-up free. Seems stable in all other respects.

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Glad I could help. I have 1 Gig of ram so its not using my swap which is why i can still use apps that are already open as long as they dont read the disk, i think. I hope the fix you're trying works out. I really want to use hardy (gutsy doesn't have cd in on the audio mixer so my tv tuner card doesn't have sound, it sucks watching tv without sound, hehe)
Can someone else with this problem see if they could get their comp to do what mine does? Next time it freezes just wait and don't touch the keyboard and don't click anything, and see if it comes back. Then see if it can read from the disk by trying to open new apps or something. I just want to make sure that my problem is the same as everyone else's that reported this.

Revision history for this message
Anthony Chianese (achianese) wrote :

I added the rt kernel as Jim March suggested, and it seems to have worked for me. I haven't been able to see this kernel lockup. I may have noticed something interesting about why my particular system was hanging:

In Gutsy, my wireless card, Linksys WMP54G v4.1, with the RaLink RT2561/RT61 802.11g PCI chipset, was detected and worked out of the box, but would randomly disconnect and refuse to reconnect. When I upgraded to Hardy beta, my card worked and I never observed this bug, but it was replaced with the hard lockups, after about the same amount of network usage. Now, with the realtime kernel, I don't see hard lockups, but I do see the original bug from Gutsy again.

Hypothesis: For all three cases (Gusty desktop default kernel, Hardy beta generic kernel, and Hardy realtime kernel), my wireless card works but fails after some data is transferred (this is discussed in bug 134660). For Gusty, or 2.6.24-16-rt, the kernel stays up. For Hardy's kernel (2.6.24-16-generic), the wireless failure results in a hard lockup.

I'm not sure how to verify this or if it's important to do so, but that's where I am now.

Revision history for this message
Jim March (1-jim-march) wrote :

Anthony, that last is to me REAL interesting.

See, I not only don't have anything Ralink in my rig, I'm not having network connection issues at all. And I rotate between three connections: Atheros WiFi, Marvell Ethernet and a PCMCIA cellmodem (Verizon EVDO). The split is about 30/10/60 in that order. All were find in Gutsy, fine in Hardy.

So. If you were freezing, and I was freezing, then it sure as hell looks like there were two different causes. That's Goddamn scary is what that is. It suggests the basic Ubuntu 2.6.24 kernel is unstable.

I've been using the -rt kernel hard'n'heavy for about 7 or 8 hours now, including the last couple compiling 2.6.25 just for kicks. With Compiz ON, web pages up, and Pandora web-radio playing in the background (Rammstein channel, hell yeah!). CPU activity pretty much pegged solid, although it's still quite responsive :). Anyways. That should have blow it to hell and gone. It hasn't.

So the -rt kernel is a viable option for those not into compiling their own kernel.

Revision history for this message
Invader Amoto (invaderamoto) wrote :

I tried the -rt kernel, and it seems worse than before. It might just be coincidence but it had a hard lockup within minutes of logging in.

Revision history for this message
Jim March (1-jim-march) wrote :

Dammit. Well that's more evidence that more than one blooper is going on here, isn't it?

OK. In my case, -rt is the right stuff. So if it's not for everybody, only shot left at a universal solution is to compile 2.6.25. I should be done with that soon, will report back.

Revision history for this message
Chris Sykes (chris-newforest-technology) wrote :

I'm having similar lock-ups since upgrading (gutsy to hardy) last night. Seems to happen with/without compiz enabled. My machine is a Thinkpad X60 with intel 945GM graphics, 2G ram & 2G swap. Intel 3945 wireless.
The last lockup occurred shortly after starting an rsync to another machine on the lan via the wired interface (intel 82573L).
dmesg, lspci attachments to follow.

Revision history for this message
Chris Sykes (chris-newforest-technology) wrote :
Revision history for this message
Chris Sykes (chris-newforest-technology) wrote :
Revision history for this message
Chris Sykes (chris-newforest-technology) wrote :

Just re-ran the same rsync on a VT to try & capture any kernel messages. Got a re-create almost immediately, and the following message as the machine hung:
    Uhhuh. NMI received for unknown reason a0 on CPU 0.
    You have some hardware problem, likely on the PCI bus.
    Dazed and confused, but trying to continue

The machine has been 100% solid under both Feisty and Gutsy over the last year.

Revision history for this message
brainiac8008 (brainiac8008) wrote :

I just installed 8.04 today on an HP a6109n with an AMD Athlon 64 X2 3800+, 2 GB RAM, and two 320 GB hard drives (one with Ubuntu and one with Vista) and I've got the same problem. I had OpenOffice, Firefox, Compiz, and the file browser running when I was creating a launcher on the desktop and the screen froze. I could only move my mouse. The only thing I was able to do was an Alt-SysRq R-E-I-S-U-B and it rebooted. Then I logged back into Ubuntu and it completely froze at the desktop. I noticed that the Caps Lock and Scroll Lock lights on my keyboard where flashing but that was it - I couldn't do anything, so I forced shutdown and I'm now in Vista. If I go into Ubuntu again and it freezes, I will wait a bit and then see if I can do anything. I might give the rt kernel a shot also, but I don't want to go into Ubuntu again! Argh! Also (I don't know if this adds anything to the discussion), I was using a wireless internet connection.

This has to be fixed. Some people are already considering going back to Gutsy. It's like the Vista downgrade to XP movement! I want my HARDY Heron back!

Revision history for this message
Robert Citek (robert-citek) wrote :

Has anyone come up with a systematic way to reproduce this freeze up? - Robert

Revision history for this message
drivinghome (drivinghome) wrote :

Hi Robert,
yes, i just have to plug the ethernet cable (see my post above) and start copying some files over samba. 4GB+ and it freezes for sure

Revision history for this message
Jim March (1-jim-march) wrote :

TO ALL: IMPORTANT

I spent much of today confirming that the -rt fixed worked for me. It did. I did my damnest to kill it: played a flash game while compiling 2.6.25 with web-radio cranking. Performance suffered, stability did not - across hours.

So -rt fixes this for some systems. See my post earlier in this thread at:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/52

...for a newbie-friendly way of making -rt work. MY SYSTEM SPECS:

Acer 3680 lappy, 80gig SATA drive, Intel-based sound, Intel Celeron single-core 1.6gHz CPU with a 533 memory bus, 1.5gigs RAM, Atheros WiFi, Intel945 video.

With .25 compiled with no errors, I'm about to switch to that. Wish me luck. I tuned that kernel to eliminate hardware virtual machine support I don't have, picked my specific CPU, etc. This is a "semi-geeky" process that some people starting out in Linux may choke on, while the -rt thing is very, very safe to at least try and you can jump back away from it with a reboot if it goobers on you.

Revision history for this message
Jim March (1-jim-march) wrote :

TO DEVELOPERS DEBUGGING THIS: ponder what's different in the -rt and -generic latest kernels (as provided in the final release Hardy). That may be one of your best clues as to what the hell is going on.

Revision history for this message
Jim March (1-jim-march) wrote :

Well my neato custom compiled .25 blew up immediately (panic). Oh well. Realtime for now.

Revision history for this message
Robert Citek (robert-citek) wrote :

FWIW, I cannot get my machine to lockup.

Tried pushing the CPU by firing off 20+ processes like this:

$ cat /dev/urandom > /dev/null &

All I got was this after about 5 hours:

$ uptime
 21:24:30 up 5:00, 2 users, load average: 25.00, 25.00, 24.83

$ vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
27 0 38164 380304 10396 48968 0 2 21 295 42 791 37 30 33 0
25 0 38164 380280 10396 48968 0 0 0 0 2 177 11 89 0 0
25 0 38164 380280 10396 48968 0 0 0 0 24 240 11 89 0 0
25 0 38164 380280 10396 48968 0 0 0 0 2 174 10 90 0 0
25 0 38164 380280 10396 48968 0 0 0 0 24 236 12 88 0 0
25 0 38164 380280 10396 48968 0 0 0 0 2 178 8 92 0 0
25 0 38164 380280 10396 48968 0 0 0 0 24 238 13 87 0 0
25 0 38164 380280 10396 48968 0 0 0 0 2 183 11 89 0 0
25 0 38164 380280 10396 48968 0 0 0 0 24 246 10 90 0 0
25 0 38164 380280 10396 48968 0 0 0 0 2 176 11 89 0 0

However, this is a headless machine to which I've ssh'ed to run these processes. Dunno if that matters.

Anyone got a suggestion for stress-testing the NIC?

Regards,
- Robert

Revision history for this message
Robert Citek (robert-citek) wrote :
Revision history for this message
Robert Citek (robert-citek) wrote :
Revision history for this message
Robert Citek (robert-citek) wrote :
Download full text (3.4 KiB)

@tzu (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/64)

I do not have an SMB server nor Windows machine to connect to. As an alternative, I tried pulling 5 GB of data from the Hardy machine via ssh over wired ethernet. Worked just fine:

$ ssh 192.168.0.205 "dd if=/dev/zero bs=1M count=5000" | time -p od -bc
0000000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
         \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 623.952 s, 8.4 MB/s
47040000000
real 625.92
user 80.06
sys 44.80

During that time, this is what vmstat looked like on the Hardy machine:

$ vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 3 0 27656 10972 1204 11552 99 97 194 99 4811 1045 9 11 75 5
 0 0 27656 10976 1204 11560 0 0 0 0 11281 2344 20 24 56 0
 1 0 27656 10936 1204 11560 0 0 0 0 11289 2555 16 28 56 0
 1 0 27656 10932 1212 11560 0 0 0 68 11216 2403 18 26 56 0
 1 0 27656 10880 1212 11560 0 0 0 0 11139 2254 15 27 58 0
 0 0 27656 10940 1212 11560 0 0 0 0 10749 2212 16 26 58 0
 0 0 27656 10952 1212 11560 0 0 0 0 11246 2254 15 27 58 0
 1 0 27656 10928 1212 11560 0 0 0 0 10942 2300 16 20 64 0
 1 0 27656 10916 1212 11560 0 0 0 0 11320 2282 18 22 60 0
 1 0 27656 10908 1212 11560 0 0 0 0 11237 2367 17 27 56 0

This machine is also running on 48 MB of RAM with a 600 MB swapfile. I accomplished this by adding this stanza to my /boot/grub/menu.lst file and making it the default:

title Ubuntu hardy (development branch), kernel 2.6.24-16-generic - test
root (hd0,0)
kernel /boot/vmlinuz-2.6.24-16-generic root=UUID=538f72a2-fbc0-454f-a058-32d1a0cae8b3 ro mem=48M
initrd /boot/initrd.img-2.6.24-16-generic
quiet

I also disabled gdm in runlevel 2:

$ ls -1 /etc/rc*.d/*gdm
/etc/rc0.d/K01gdm
/etc/rc1.d/K01gdm
/etc/rc2.d/K01gdm
/etc/rc3.d/S30gdm
/etc/rc4.d/S30gdm
/etc/rc5.d/S30gdm
/etc/rc6.d/K01gdm

I tested swap and heavy disk access by running this command:

$ time -p seq 1 10000000 | sort -S 100M > /dev/null
real 483.37
user 74.66
sys 6.37

and while it was running, vmstat looked like this:

$ vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r b swpd free buff cache si so bi bo in cs us sy id wa
 1 2 132108 1188 56 1896 394 298 480 316 4904 1149 11 11 64 14
 0 1 132108 1456 52 1808 1908 276 2024 276 322 294 1 2 0 97
 0 1 132108 1416 52 1784 1984 2152 1984 2152 339 321 3 2 0 95
 0 1 132108 1548 52 1604 2712 3360 2712 3360 357 459 4 1 0 95
 0 2 132108 1504 48 1468 1288 512 1320 512 250 314 2 2 0 96
 0 2 132120 1048 48 1000 3732 1008 3732 1008 423 483 8 0 0 92
 0 1 132164 1352 48 920 2148 1844 2148 1844 356 346 2 1 0 97
 0 ...

Read more...

Revision history for this message
brainiac8008 (brainiac8008) wrote :

I've been looking around online for another report of the same issue and I've found a thread on the Archlinux forums at http://bbs.archlinux.org/viewtopic.php?id=43932&p=1 (sorry, can't do a hyperlink) that seems to describe the exact same problem with the 2.6.24 kernel.
One person mentions running glxgears will cause the problem (anyone want to try that?).

Another says, "To me it seems to be a IRQ issue. Maybe a BIOS update is needed. I use ndiswrapper and wicd for network connectivity. The freezes last between 30 secs and 5 minutes but always comes back. It happens even on console (without X) so it is either an IRQ issue or a kernel driver problem."

The final post says, "I've been experiencing the same problem too, but just the other day I tried turning off the shadow memory in BIOS, and the system got back to "normal" state. No more freezes (for a whole day, yay!). Hope, this helps."

Anyone know about shadow memory? "Hope this helps."

--Noah

Revision history for this message
brainiac8008 (brainiac8008) wrote :

Just found another site talking about the freeze with the 2.6.18 and the 2.6.24 kernels, but mainly the .24 - http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-04/msg05446.html. It talks about SATA I/O being the issue. I have Hardy Heron installed on a SATA drive.

This bug is so wierd. People think it could be anything that's causing it, from wireless/wired internet connections to graphics to the BIOS to heavy disk and internet I/O.

--Noah

Revision history for this message
PeteDonnell (pete-donnell-deactivatedaccount) wrote :

I just tried installing the -rt kernel and stress testing the system by watching a video on an nfs share while downloading updates in Synaptic. The system locked up just as before, so it looks to me as though whatever's causing the crash is present in there too.

Has anyone tried compiling an unmodified (as in un-Ubuntified) of this kernel release direct from kernel.org?

Revision history for this message
martyg (snowbird) wrote :

I have a vanilla (no patches) 2.6.25 running since this morning. Too early to tell if this fixes anything.
Excellent instructions here if you want to go down that path: https://help.ubuntu.com/community/Kernel/Compile

Revision history for this message
Jim March (1-jim-march) wrote :

Huh. Well the -rt kernel IS working for me still, long enough that I can confirm it solved things for me.

Which in turn suggests we've got multiple issues going on.

Question: since I don't use Evolution, I yanked out as many Evo-related modules as possible without losing "ubuntu-desktop" or the like. I did this because in beta, every once in a while something Evolution related would peg my CPU strength meter like crazy and often trigger a lockup. Stripping down Evolution solved the "Evo goes nuts" bug but didn't resolve all crashes...helped avoid 'em a little longer maybe. So out of habit, on the release Hardy, I've gutted Evo yet again "just in case" as I'm a Thunderbird guy...is anybody else trying this, or noticed an "Evo Gone Wild!" problem in the release Hardy?

Revision history for this message
Anthony Chianese (achianese) wrote :

I've worked around my crashes by using ndiswrapper for my wireless card. With the generic or the rt kernel, it's perfectly stable for me now.

To answer your question Jim, I use thunderbird but never uninstalled Evolution or noticed anything unusual with CPU usage.

Revision history for this message
Jim March (1-jim-march) wrote :

Well NOW it'll get fixed. One of the reviewers over at ITWire ran into the random lockups problem.

http://www.itwire.com/content/view/17863/1103/1/3/

Revision history for this message
tbranham (tbranham) wrote :

Hey all,

I just want to add another voice to the choir. My girlfriend's laptop is experiencing the hard lockup problem as well. There really does not seem to be any rhyme or reason for the crashes. They lock the machine totally (except the mouse cursor will still move around, even through the wireless USB mouse...); I can not ssh into the machine to get any relevant state information, so no luck there. We've tried several things listed in the forums, and every time we think we've got the problem isolated, we get smacked with another hard-reset. Her laptop is an Acer Travelmate 4400, and I'll post the requested data. Hopefully we can get this solved -- gutsy had problems on her lappy, but never to this extent!

For the record, these are the things we've tried, or seem relevant:
* Many of the lockups occur when Firefox is running, but there are still a significant number to suggest this is not the sole cause.
* Upon returning from a quick jaunt to the kitchen (for a frosty beverage...) the machine would be locked up, which made us suspect the screen-saver. Disabling the screen-saver, however, has not solved the problem.
* Disabling compiz had no effect on the frequency of the lockups.
* Several of the lockups occurred while installing or updating packages in Synaptic.

I know that is not a lot of helpful info, but hopefully it will stress the urgent need for a fix! If you require any additional information, please do not hesitate to ask!

Revision history for this message
tbranham (tbranham) wrote :
Revision history for this message
tbranham (tbranham) wrote :
Revision history for this message
Jim March (1-jim-march) wrote :

tbranham: try the -rt kernel. I'm also running an Acer, although in my case the lockups froze the mouse. I posted directions on how to load it further back in the thread, it's not a "geek required" thing and you can revert back any time.

https://bugs.launchpad.net/bugs/204996

Revision history for this message
Invader Amoto (invaderamoto) wrote :

I just had an idea to see if it does have to do with disk read access, at least for me. I remember reading that you could run ubuntu completely in ram. I'll see if i can find out how, and maybe I'll see if it works, if it's not too complicated.

Revision history for this message
tbranham (tbranham) wrote :

Thanks for your reply, Jim.

I'm actually going down the 2.6.25 route at the moment, but I've always had trouble tweaking the kernel on her lappy, so I expect I'll be giving -rt a try later tonight. I'll post if there are any significant changes to the situation.

Oh, two more crashes since my last post, both while using firefox. This is, by far, the highest frequency we've experienced. Too bad this is so random...

Thanks again.

Revision history for this message
Santiago Zarate (foursixnine) wrote :

Same problem here... but with hardy's default kernell (2.6.25 i guess) i am still on feisty, lappy is a compaq v3417la with everything working, and the wireless is using the b43 firmware...

Mostly i get the lockups with compiz and video playing...

Revision history for this message
Jim March (1-jim-march) wrote :

Well after two days of hard pounding, the -rt kernel finally crashed on me.

Dammit.

It was the "other kind" of crash, too, the one where the mouse still works. Tells me again that we've got multiple causes of failure going on.

Now to try compiling 2.6.25 again (Hardy's default is a .24...).

Revision history for this message
Jim March (1-jim-march) wrote :

Could whoever has successfully compiled 2.6.25 post a link to the instructions that worked? There appear to be a bunch of "guides" which are too outdated for Hardy.

Revision history for this message
Robert Citek (robert-citek) wrote :

For those experiencing crashes, after the crash can you reboot and post the following logs:

cat /proc/interrupts > interrupts.log
sudo dmidecode >dmidecode.log
cat /proc/mtrr > mtrr.log
/var/log/Xorg.0.log
/var/log/kern.log
/var/log/kern.log.0

Regards,
- Robert

Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :

Just crashed.

Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :
Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :
Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :
Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :
Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :
Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :

Crash occured Apr 28 15:35. Mouse moved long time after that. Machine freezed totally just before 16.08 when I tried to start Evolution.

Revision history for this message
Robert Citek (robert-citek) wrote :

On Mon, Apr 28, 2008 at 8:30 AM, Rami Autiomäki wrote:
> Crash occured Apr 28 15:35. Mouse moved long time after that. Machine
> freezed totally just before 16.08 when I tried to start Evolution.

Your kernel logs seem to confirm that:

http://launchpadlibrarian.net/13985779/kern.log

...
Apr 28 15:35:37 Acer kernel: [ 4250.751429] Pid: 0, comm: swapper Tainted: P (2.6.24-16-generic #1)
Apr 28 15:35:37 Acer kernel: [ 4250.751435] EIP: 0060:[mac80211:_spin_unlock_irqrestore+0xd/0x40] EFLAGS: 00000286 CPU: 0
Apr 28 15:35:37 Acer kernel: [ 4250.751442] EIP is at _spin_unlock_irqrestore+0xd/0x20
Apr 28 15:35:37 Acer kernel: [ 4250.751447] EAX: 00000286 EBX: 00000001 ECX: 00000286 EDX: 00000000
Apr 28 15:35:37 Acer kernel: [ 4250.751452] ESI: ffffffff EDI: df823808 EBP: 00000000 ESP: c0417f34
Apr 28 15:35:37 Acer kernel: [ 4250.751458] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Apr 28 15:35:37 Acer kernel: [ 4250.751463] CR0: 8005003b CR2: b7c05000 CR3: 1f9f7000 CR4: 00000690
Apr 28 15:35:37 Acer kernel: [ 4250.751468] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Apr 28 15:35:37 Acer kernel: [ 4250.751472] DR6: ffff0ff0 DR7: 00000400
Apr 28 15:35:37 Acer kernel: [ 4250.751476] [tick_notify+0x25b/0x390] tick_notify+0x25b/0x390
Apr 28 15:35:37 Acer kernel: [ 4250.751538] [notifier_call_chain+0x30/0x60] notifier_call_chain+0x30/0x60
Apr 28 15:35:37 Acer kernel: [ 4250.751580] [raw_notifier_call_chain+0x17/0x20] raw_notifier_call_chain+0x17/0x20
Apr 28 15:35:37 Acer kernel: [ 4250.751601] [processor:clockevents_notify+0x19/0x5770] clockevents_notify+0x19/0x60
Apr 28 15:35:37 Acer kernel: [ 4250.751624] [<f88a25b8>] acpi_idle_enter_simple+0x166/0x1b6 [processor]
Apr 28 15:35:37 Acer kernel: [ 4250.751677] [cpuidle_idle_call+0x7c/0xb0] cpuidle_idle_call+0x7c/0xb0
Apr 28 15:35:37 Acer kernel: [ 4250.751696] [cpu_idle+0x45/0xd0] cpu_idle+0x45/0xd0
Apr 28 15:35:37 Acer kernel: [ 4250.751719] [start_kernel+0x30a/0x3a0] start_kernel+0x30a/0x3a0
Apr 28 15:35:37 Acer kernel: [ 4250.751732] [unknown_bootoption+0x0/0x1e0] unknown_bootoption+0x0/0x1e0
Apr 28 15:35:37 Acer kernel: [ 4250.751795] =======================
Apr 28 15:35:37 Acer kernel: [ 4250.761032] BUG: soft lockup - CPU#0 stuck for 11s! [swapper:0]
...

Here's a grep for segf on your kern.log.0:

$ grep -i segf kern.log.0
Apr 26 21:22:08 Acer kernel: [19170.228040] gdm[11140]: segfault at 10c01978 eip b77f6635 esp bfabd630 error 4
Apr 27 08:37:20 Acer kernel: [ 451.371626] gnome-screensav[11545]: segfault at 00000200 eip b7f6fbad esp bf88e780 error 4
Apr 27 08:47:29 Acer kernel: [ 1059.857525] gnome-screensav[20451]: segfault at 00000200 eip b7fa8bad esp bff30620 error 4
Apr 27 08:48:07 Acer kernel: [ 1097.341250] glxinfo[21048]: segfault at 00000200 eip b7fc4bad esp bff6b840 error 4
Apr 27 08:55:03 Acer kernel: [ 783.810419] gdm[6179]: segfault at 10c01978 eip b7876635 esp bfe3b130 error 4
Apr 28 06:57:57 Acer kernel: [ 266.011719] gdm[6100]: segfault at 10c01978 eip b7806635 esp bfa674c0 error 4

There's no mention of segf in the kern.log file.

Dunno if the two things are related.

Regards,
- Robert

Revision history for this message
Dylan McCall (dylanmccall) wrote :

Thanks for that link, brainiac8008. Those are exactly the symptoms I am experiencing (and I think this report is about), and someone had previously thought it could be an I/O problem because the kernel was not writing logs even with Alt-SysRQ stuff. (Although it was perfectly content blinking the caps lock light).

I am concerned that there seem to be a lot of completely unrelated bug reports mingled with this one.
This particular report regards a lockup that has not been attributed to any particular user-space software, and is not a typical freeze, or xorg's doing. The kernel itself is dieing (at random) with a kernel panic, and not recording information about it. A kernel panic can often be recognized by flashing caps lock and scroll lock lights.

Having said that, we seem to be also looking at a glaring problem with a very easily upset kernel here. I have had a problem with hard drive becoming suddenly unusable before, both on this laptop with a SATA drive and on a desktop with an IDE drive. Could this be that same issue evolved to epic proportions because of the kernel now falling over on its side as a result?

In fact, I am seeing a lot of old bugs I have experienced myself mentioned here, suddenly causing system-wide lockups instead of just Samba no longer working or wireless (ndiswrapper?) disappearing.

I guess that gives us an important question of filing: Is this bug report about the kernel being really unstable, or a particular i/o bug? I am thinking the latter, since it would seem that is the one we can blame for logs not being written. With that in mind, perhaps the general instability stuff that is not an i/o problem could be pulled to another bug report? (Or have I gone mad?)

Regarding i/o bug: On my laptop, I have turned down swappiness and have a lot of real memory. The swap file is rarely, if ever, written to. Could it be that each system's reliance on the swap file is affecting the experienced symptoms?

Revision history for this message
Santiago Zarate (foursixnine) wrote :
Revision history for this message
Chris Sykes (chris-newforest-technology) wrote :

Given the NMI messages I was getting at the console when the hang occurred, I tried
booting with "nmi_watchdog=2". This seems to have stopped the crashed occurring
for me. I've managed to scp several gigs of data over the LAN whereas before, the
machine would hang after just a few hundred megs.

Unfortunately this conflicts with vboxdrv from virtualbox-ose :-(

I'll continue testing & post any further results. If anyone would like me to test anything
else out I'm happy to help.

Revision history for this message
rmccabe3701 (robertjmccabe) wrote :

I am also having the lockup problem with Hardy 8.04.
My system is
HP dc7700 2Gb RAM and an nvidia 5200 GeForce FX video card
Symptoms:
-- Complete lockup of screen and keyboard at irregular times (doesn't really depend on cpu or network workload -- the latest crash happened overnight)
-- The CAP lock and Scroll lock leds flash at half second intervals.

I have attached my crash report. Looking at the kern.log my latest crash was last night at 11:07:
...
Apr 27 21:07:22 rob-desktop kernel: [10938.342743] u32 classifier
Apr 27 21:07:22 rob-desktop kernel: [10938.342749] Actions configured
Apr 28 15:59:32 rob-desktop kernel: Inspecting /boot/System.map-2.6.24-16-generic
...
(you will notice that there are several compiz and nvidia segfaults in the kern.log -- these were a couple days ago when I was reinstalling the nvidia driver)

I hope this helps. Other than this (major) issue the OS looks great!

Revision history for this message
aclimber (aclimber) wrote :

I am not running any wireless services or SATA devices but am still experiencing the same freezing issues described. I'm going to be held accountable to my wife soon. ;-|

Revision history for this message
aclimber (aclimber) wrote :
Revision history for this message
aclimber (aclimber) wrote :
Revision history for this message
aclimber (aclimber) wrote :
Revision history for this message
aclimber (aclimber) wrote :
Revision history for this message
aclimber (aclimber) wrote :

I had a crash just before posting these but had been good for the previous hour and a half watching a movie. Then once it crashed I tried running the scripts to get the log files you asked for and it crashed twice while trying. Good luck!

Revision history for this message
Alexander Hunziker (alex-hunziker) wrote :

I suspect that tzu's remark might be correct that the e1000 module is responsible for the lockups. While I'm not 100% sure, I also think that my machine only started hard locking since I use wired ethernet at the university, I didn't do it at home when using wireless.

Revision history for this message
martyg (snowbird) wrote :

Bad news.

My vanilla (straight from kernel repository, no patches) 2.6.25 kernel crashed last night.
Same symptoms as Ubuntu's 2.6.24-16.

This issue does not appear related to any custom Ubuntu patches.
Recommend this bug be escalated to Linus and his crew.

It's been fun, but I'm going back to Debian-etch (2.6.18) which has been stable on this hardware for me for years.

Revision history for this message
Wolfgang Glas (wglas) wrote :

Sadly, that's our experience. 2.6.18 is the last kernel, which has been released as part of a stable (stable as we mean it, not ubuntu-stable) Linux distribution, namely debian-etch and RHEL5. Since 2.6.20 we've seen not a single kernel, which survived more than 1 day on our servers.

Even more sadly we expect, that this bug will be removed by either redhat, Novell or debian in an effort to release a new enterprise-level distribution.

Revision history for this message
Joe (fullmitten) wrote :

I concur with Wolfgang. I have a rock-solid Ubuntu Edgy desktop install (2.6.20-16.35-generic) that locks up within 10 minutes of booting the Hardy Live CD or install. I tried installing the -rt kernel as Jim March suggests and it failed as well.

Revision history for this message
brainiac8008 (brainiac8008) wrote :

So what do we do? Roll back to an older kernel? Run off of an older release of Ubuntu? There must be something that all of our computers have in common that is causing the bug because not everyone is getting the bug. aclimber does not have any SATA devices and I am getting the lock-ups with wireless internet (among others, including Valkai, who made the bug report). Maybe it's just disk I/O that's causing the problem, and not specifically SATA?

--Noah

Revision history for this message
Nick2000 (monpetitbeurre) wrote :

I have 3 8.04 desktops and the only one experiencing the hard freezes is running in 64bits on a Core 2 Duo with a 965 motherboard (with video) and 4GB of RAM. It does not have wireless but does have one SATA drive. A Xeon and an AMD 1700XP are fine (except for some video card/resolution recognition problems)

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Hey Noah,
I think it is just an disk I/O error. I'm pretty sure that at least my problem is. (there may be multiple problems here)
I tried going back to Gutsy, but I'm having even worse problems that are the same as with the Hardy final release (my lock up problem was with the beta, and the final release after i removed my nvidia card, see here: http://ubuntuforums.org/showthread.php?t=771115 ).
So right now, and until this problem is solved, I am using the live cd. I have a partition to save all my files so I don't loose stuff in case of a crash or power outage. And as long as I don't shut it down, all the stuff i installed (flash plugin, tvtime, etc.) stays installed. So the only difference is it's slower when starting apps or anything that would normally have to do with reading the hard drive.

-Invader Amoto

Revision history for this message
Santiago Zarate (foursixnine) wrote :

Well i think just abandon it like that wont solve anything... tho... i dont belive its a disk I/O... since i think there's a common denominator: Nvidia cards, Wireless Stuff (which has to be loaded with magic)

Can you all put the hardware, model, kind of machine you have?... like this:

Laptop
Compaq v4317la
Video: nvidia (with nvidia_new driver)
Wireless: broadcom BCM431193 (feisty reports: Broadcom Corporation Dell Wireless 1390 WLAN Mini-PCI Card (rev 02)) using b43 driver

Revision history for this message
Chris Sykes (chris-newforest-technology) wrote :

It does look like there may be several different issues here.
Has anyone else tried booting with "nmi_watchdog=3" ?

My thinkpad X60 has been stable the last couple of days like that (whereas before it
would crash soon after starting any substantial network load on e1000), and
I can still use VirtualBox with this setting.

Revision history for this message
PeteDonnell (pete-donnell-deactivatedaccount) wrote :

Attached requested crash logs. I tried booting with nmi_watchdog=2 yesterday, and it didn't help.

My computer was built of old spare parts, it contains
Epox 8RDA3+ motherboard
S3 Virge graphics
2x Onboard LAN: some kind of forcedeth, which is connected, and an 8139too, which isn't.

This computer has no nVidia, e1000, or wireless of any kind.

The motherboard has a SATA chipset, but I don't have any SATA drives and it's disable in BIOS.

Hope this helps a little!

Revision history for this message
Robert Citek (robert-citek) wrote :

@PeteDonnell (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/116)
>This computer has no nVidia, e1000, or wireless of any kind.

According to your crash logs, your machine does have nVidia.

$ grep -m 2 -i nvidia *.log
dmidecode.log: Product Name: nVidia-nForce
interrupts.log: 21: 22 IO-APIC-fasteoi NVidia nForce2
kern.log:Apr 21 17:22:37 merlot kernel: [ 0.000000] ACPI: RSDP 000F74B0, 0014 (r0 Nvidia)
kern.log:Apr 21 17:22:37 merlot kernel: [ 0.000000] ACPI: RSDT 3FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD 0)

$ grep -m 2 -i nforce *.log
dmidecode.log: Product Name: nVidia-nForce
interrupts.log: 21: 22 IO-APIC-fasteoi NVidia nForce2
kern.log:Apr 21 17:22:37 merlot kernel: [ 25.893503] PCI: nForce2 C1 Halt Disconnect fixup
kern.log:Apr 21 17:22:37 merlot kernel: [ 29.707418] forcedeth: Reverse Engineered nForce ethernet driver. Version 0.61.

Regards,
- Robert

Revision history for this message
Joe (fullmitten) wrote :

I've gotten back around to re-attaching my Hardy install. It was up for about 8 minutes and crashed when opening a PDF file. My crashes do not appear to be related to CPU, IO or network activity as this install will lock up just sitting still. As I said before, it is very stable with Ubuntu Edgy (2.6.20-16.35-generic). I do not have a wireless network in this machine.

Gigabyte 7ZXE Mobo
AMD Athlon XP 2000+
1024 Mb RAM
ATI Radeon 8500
SMC1255TX PCI Ethernet Adapter
WDC WD2500JB, ATA Hard drive
SONY DVD RW DRU-720A, ATAPI CD/DVD-ROM
SONY CD-RW CRX320E, ATAPI CD/DVD-ROM

Revision history for this message
Robert Citek (robert-citek) wrote :

@Joe (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/118)
>I've gotten back around to re-attaching my Hardy install. It was up for about 8 minutes and crashed when opening a PDF file.

8 minutes is the fastest I've seen posted. How reproducible is that? This is, if you do the sames steps after each reboot, does the system crash consistently and does it crash in 8 minutes or less?

Regards,
 - Robert

Revision history for this message
Joe (fullmitten) wrote :

@ Robert (https://bugs.launchpad.net/ubuntu/+bug/204996/comments/119)
>8 minutes is the fastest I've seen posted. How reproducible is that? This is, if you do the sames steps after each reboot, does the system crash consistently and does it crash in 8 minutes or less?

I don't think I've had Hardy go more than 10 minutes in the dozen or so times I've booted it, either using the live CD or hard-drive install.
What I do does not appear to influence the crash. It's happened while I'm using Firefox, checking log files or sitting there doing nothing. In the crash that I posted about above, I'd opened a single-page PDF file (~57k) and switched back to the Nautilus window I'd opened it from.
If it weren't for Edgy being so stable (I'm posting from it now), I'd swear it's my hardware.

Revision history for this message
wicketr (wicketr) wrote :

>8 minutes is the fastest I've seen posted. How reproducible is that?

I can easily beat that. For one, I've brand new to Linux. I've tried 7.10 and now 8.04. Both locked up with the standard install. And I'm talking before it was even at the point of picking which hard drive to install it on. Basically, I stick the CD in, click "Install" and it locks up. And it would lockup and different points as the Ubuntu Loader screen was running.

I did, however, try the "Alternate CD" the other night with 8.04 and it worked! YAY! However, after it rebooted and it took me to the login screen, I entered the credentials, and as it was playing the "startup sound" it locked up and repetitively played about a 1 second clip of it continuously until i had to reboot. Now, even in "recovery mode" it locks up even before getting to the logon screen .

My hardware is fine as I've been running XP and/or Vista on it for 2 years.

Hardware:
Intel E6600
2GB OCZ Ram (2x OCZ2P8001GK)
Foxconn P9657AA-8KS2H
Maxtor SATA 250GB hard drive
2 SATA DVD Drives
Nvidia 7900GS

I'd include any attachments to log files, but i have no idea what I'm doing with Linux, and I can't even get to the login screen to type that stuff.

Revision history for this message
iopo (iopo) wrote :

Hi!
I'm having the same issue. The freeze is quite random but I found out that it always happen after few minutes I'm playing a game called Glest.
At first I thought it was a video card issue but I tried with the proprietary driver, without it, with compiz running, with compiz without running and it still freezes.

In fact I was able to have induce a freeze just few minutes ago (04/30/08 at 10:59 AM). You may find something useful in my logs.

Best

 specs:
 dell inspiron 6000
 ati mobility radeon x300
 Intel Corporation PRO/Wireless 2915ABG

Revision history for this message
iopo (iopo) wrote :
Revision history for this message
iopo (iopo) wrote :
Revision history for this message
iopo (iopo) wrote :
Revision history for this message
iopo (iopo) wrote :
Revision history for this message
iopo (iopo) wrote :
Revision history for this message
iopo (iopo) wrote :
Revision history for this message
Robert Citek (robert-citek) wrote :

@Joe (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/120)
> What I do does not appear to influence the crash.

OK. Does this reproduce the crash?

1) boot into Ubuntu
2) log in via GDM (enter username and password)
3) open a Terminal (Applications > Accessories > Terminal)
4) type "~/evince Examples/case_Wellcome.pdf &"

If so, how long does it take from boot up to crash?

Regards,
- Robert

Revision history for this message
Robert Citek (robert-citek) wrote :

Woops. That fourth step should be this:

4) type "evince ~/Examples/case_Wellcome.pdf &"

Regards,
- Robert

Revision history for this message
Joe (fullmitten) wrote :

@ Robert:
Here's my tediously documented course of action using the Hardy Live CD (My Hardy install is a single install on a different hard drive than my Edgy install and I don't want to wear out my cable switching back and forth)
Start my stopwatch when I hear the Ubuntu chime.
1:25 = switch screen resolution because the Live CD's default is something my monitor doesn't like
3:28 = type "evince ~/Examples/case_Wellcome.pdf &"
4:31 = close the PDF above and open the one that crashed my system before (it's on a flash drive that's still plugged in from the previous boot)
4:58 = close that PDF and open the Examples folder
5:56 = open each case study in turn, scan over it, close it.
9:18 = Machine freezes opening the Aesop's Fables file

Revision history for this message
Joe (fullmitten) wrote :

I booted back into the Live CD so I could record the screen resolution (it boots up as 1440x900, but my monitor only supports 1280x1024) and it locked up after clicking the "apply" button on the screen resolution dialog. Total uptime: 58 seconds.

Revision history for this message
PeteDonnell (pete-donnell-deactivatedaccount) wrote :

@ Robert: sorry, what I meant to say was "nVidia _graphics_", since a few people had been suggesting that might be the cause...

Revision history for this message
Santiago Zarate (foursixnine) wrote :

being using the nvidia graphics is not the problem... _having_ them is.... as far as i can see... i think we all have linux-restricted-drivers or so instlled... most of us have nvidia stuff... (even on desktop pcs).. and some have ati... :/ ima try with the nmi_watchdog=3 stuff

Revision history for this message
Santiago Zarate (foursixnine) wrote :

Nope... not working either... tried with a -rt kernel and nmi_watchdog=3 irqpoll options but didnt work at all... machine locked up after... hmm... 45 mins... has anyone tried a .26 kernel??

Revision history for this message
brainiac8008 (brainiac8008) wrote :

Here are my specs:

HP a6109n Desktop PC
AMD Athlon 64 X2 Dual Core Processor 3800+
2 SATA 320 GB Hard drives (one with Ubuntu [WD3200AAKS], one with Vista)
2 GB RAM
Asus M2N68-LA OR Narra2-GL8E Motherboard ("Narra2-GL8E" is HP's name for the mobo)
Video: NVIDIA Linux Display Driver from nvidia.com - NVIDIA GeForce 6150 SE nForce 430
Wireless: ZyDAS ZD1211 IEEE 802.11b+g USB Adapter (and NVIDIA nForce Networking Controller)
ATAPI DVD A DH16A1L SCSI (SATA) DVD Drive

--Noah

Revision history for this message
KevinM (kevbert1) wrote :

I'm running the following:
AMD Athlon 64 3800+ CPU
2 SATA hard drives - one 160Mb with XP, the other 500Mb with Linux including Ubuntu Hardy Heron 64 bit.
Nvidia Geoforce 7600 with restricted drivers.
Belkin wireless card based on Broadcomm BCM 4308 using restricted drivers.
Yesterday I had 7 times when the PC locked up solid while running Firefox 3. The PC would not allow me to reboot, restart X or recover after waiting some time. The only way to continue was a hard reset via the PC front panel switch. All times I was running BOINC, Screenlets and Compiz in the background.
I've temporarily downgraded Firefox to 2.0.0.14 and now do not have any lockups with all other background tasks running.
In my case I believe that the lockups are due to either Firefox 3 or a Mozilla plug-in for in-browser video playing. Every time the lockup occurred I was trying to watch video on the BBC news website (http://news.bbc.co.uk) and the fault occurred almost immediately after the stream started to download.
 Unfortunately due to other issues I'm shortly going to re-install Gutsy.

Revision history for this message
petebass4life (pete-bass4life) wrote :

CLUE!:
Upon downloading/installing from 7.10-8.04 LTS using package manager my system froze (first time) during the cleaning up old packages step! My theory and it might be completely off but there could be (only works when upgrading from package manager) an old package that conflicts with a newer package and the kernal or what ever reads the package is not understanding which to use or how to use it and panicking. I had to turn power off from on switch (or hold down button) and it did not finish cleaning up system this could also be a script that didnt get cleaned up im not sure, maybe someone with more time on their hands could look into this more deeply?
Hope i helped anyway :)

Revision history for this message
soccerguy53 (jared-alewine) wrote :

Is there any way we could get a response from a dev letting us know status of this and / or updates? This is obviously effecting multiple users. Also, I am running a IBM T60 P laptop. I am having this issue as well as others. I have intel 945 on-board video, intel gigabit ethernet controller, and atheros wireless controller. I have this issue randomly throughout the day. I have incrementally tried turning off compiz, firefox 3, and wireless drivers one at a time. I haven't tried combinations of those 3 together. I still see the issue no matter which of those I change. Thanks in advance for any help.

Revision history for this message
tbranham (tbranham) wrote :

OK, My partner has experienced another crash (actually, 2). She has been using her machine as normal since my last update, so this is actually a very long time between symptoms. By the way, I had limited success with the 2.6.25 kernel (her hardware is picky...), so she's still running 8.04's default kernel.

First crash:
Her only report is that she was in Synaptic Package Manager when it crashed. I will try to get details as to what she was installing, when (in the process) it crashed, and what other applications she had open at the time.

Second crash:
"It crashed again. You can tell by the time stamps of this email how soon after the fact it crashed. This time I was in a terminal and was able to halt and reset nicely. I had no mouse control." Again, I will try to get more relevant details.

I'm attaching some logs for each crash.

What a frustrating bug.

Revision history for this message
tbranham (tbranham) wrote :

Second crash logs...

Revision history for this message
Robert Citek (robert-citek) wrote :

@tbranham
> What a frustrating bug.

Indeed, especially since we do not have a test case that works for everyone nor any error messages to give a clue where to look. To top it off I have got two machine running Hardy without issue, one of which has been up for over four days straight acting.

Some things I'm looking into:

https://help.ubuntu.com/community/BootOptions

zless -iX /usr/share/doc/linux-doc-2.6*/Documentation/kernel-parameters.txt.gz

http://www.mjmwired.net/kernel/Documentation/kdump/

The crashkernel= option looks interesting.

Regards,
- Robert

Revision history for this message
hardyn (arlenn) wrote :

Soccerguy,

I have wondered that in the past, do we even know if this bug has be read by a developer?

I am not currently having this problem, although i have chosen to sit with gutsy for a while until this blows over. I am however watching this bug pretty closely as hardy was supposed to be the big release, and kernel issues are probably not what cannonical wanted.

Revision history for this message
Slade Winstone (slade-winstone-yahoo) wrote :

Hi (just my two cents),

Many, many people have been experiencing lock-ups since at least Gutsy (myself included).

I did a complete re-install of Hardy, and I'm still experiencing lock-ups. I run an ASUS A8M laptop and a Nvidia graphics card.

I found that if I switch ACPI off during boot (either, noacpi or acpi=off) that I have absolutely no problems with lock-ups (of course I have other problems, but they are minor in comparison). So, no lockups with ACPI off! Is this somehow realted to a long ago, far away kernel upgrade and a very poor BIOS ACPI implementation?!? (just a guess...)

Anyhow, lots of luck guys... :) and I hope maybe switching ACPI off might help at least one person.

I'll keep checking back in the hopes of a possible fix.

Revision history for this message
Invader Amoto (invaderamoto) wrote :

I THINK I FOUND A FIX FOR THIS BUG!

I reinstalled on a wing and a prayer, but this time with the alternate cd. Everything seems to work perfectly. I havent had any lock ups, although ive only been on it for like 30 mins(but it usually locked up way before that). Ive installed a couple programs, updated, im currently running firefox pidgin and tvtime, and i was also using vlc to listen to music. So far, no lock ups. I hope this fix works for everyone else. (and i hope it keeps working for me!)
All the other problems that i had are fixed(which were mostly with the release candidate, because the final release wouldnt work at all): http://ubuntuforums.org/showthread.php?p=4866559

I can finally use my Hardy Heron without the live disk. FANTASTIC!

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Well. It finally locked up. It's about an hour since I started using it. This lock up is different than the ones before though. It still seems to be caused by the same thing (or the problem is causing this): hard drive I/O errors.
It locked up with the mouse working, and the cursor would change depending on where it was. So I decided to not tough anything, and just wait it out. Like I mentioned a while back, it did come back. This time though, it seems to stay back. It doesn't lock up permenently like before. But anything that has to do with reading the hard drive will make it display an error message or it will just do nothing. This is better than before, and i guess its pretty usable. I just have to open all the programs that I would want right when i log in. I'm gonna try updating my BIOS to see if that does anything. I could also try reinstalling it on a different drive (which could be a pain in the ass) to see if its from SATA or just my specific drive.
I hope the alternate cd helps other people more than it helped me. (although it did help me quite a bit)

-Invader Amoto

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Well I just confirmed its an I/O error: since anything that has to do with the hard drive doesnt work or shows an error message, I tried alt ctrl backspace and alt ctrl f1, but neither worked. Then i tried alt printscreen R E I S U B, and suddenly a bunch of scrolling text showed on all black and white (looks like fullscreen terminal) but it was scrolling so fast it was hard to see. I dont even know if it was scrolling, but it kept blinking on certain lines and it was like it was going back and forth showing the same two lines. One line I couldnt make out, but the other said, ¨I/O error, dev sdb, sector 53070591¨ somewhere towards the end of the line. At least now Ubuntu shows what the problem is. (sorta) The strange thing is, is that sdb is my storage drive that i usually dont even have mounted. In all my other installs, i had it mounted as /home but this time i decided to try it differently. I did have it mounted though, for about 45 mins, so maybe if i dont have it mounted, it won´t lock up.
My storage drive is an IDE drive, and my other one is a SATA drive, so i guess its not related to SATA, then.

-Invader Amoto

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Well I'm back with more. I don't feel like explaining everything (not really that much), but get this bug to happen with system log viewer open. I did and it showed a bunch of stuff like I/O error like i said it was showing before (except this said sda), and much more. I stupidly hard reset the computer, not thinking, and now the errors it showed aren't there. It couldn't write them to the disk I guess.
Also, I was changing around some keyboard preferences when it happened and three error messages popped up showing:
"
Error activating XKB configuration.
It can happen under various circumstances:
-a bug in libxklavier library
-a bug in X server (xkbcomp, xmodmap utilities)
-X server with incompatible libxkbfile implementation

X server version data:
The X.org Foundation
10400090

If you report this situation as a bug, please include:
-The result of xprop -root | grep XKB
-The result of gconftool-2 -R /desktop/gnome/peripherals/keyboard/kbd
"

I know that has to do with a keyboard (settings) problem and it was probably caused by the I/O problem. Next time the lock up happens I'll make sure to have system log viewer open and I'll copy down the error messages that come up during and after the lock up.
Everyone else see what happens if system log viewer is open when their computer locks up (if you have similar lock ups to mine, where it isn't really a full, permanent lock up).
Also, if people are having lock ups that aren't related to I/O errors like mine are, then shouldn't this bug be split up? Although, I think there's like only one or two other people who think their problem is I/O errors (which I now know it is, at least for me).

-Invader Amoto

Revision history for this message
Invader Amoto (invaderamoto) wrote :

*sigh*
Here I am again.
It locked up again, shortly after my last comment, but this time it was a permanent hard lock up. I had System Log Viewer open but it was no use, because everything froze. It seems that firefox is the first program to go dark every time (compiz does this) then it spreads to whatever other ones are open. And now, doing alt printscreen R E I S U B just shows an all black screen.
This bug is getting more annoying. I thought I had it working earlier, and now I'm back at square one, except with more evidence of I/O errors. Should I start a new bug for I/O error related lock ups, or just stick with this until other people can figure out some reason to their lock ups?

I guess it's back to the live disk...*sigh*

-Invader Amoto

Revision history for this message
Bastanteroma (bastanteroma) wrote :

Just installed the 2.6.24-17 kernel and I still experience hard lockups. In my case it usually happens soon after logging in, not if it sits at GDM and not past a few minutes into a session. I get no flashing lights, can't move the mouse, can't switch to a virtual terminal.

Nvidia card with binary driver, zd1211rw usb wireless card.

Revision history for this message
Bastanteroma (bastanteroma) wrote :

Linux ubuntu 2.6.24-17-generic #1 SMP Thu May 1 14:31:33 UTC 2008 i686 GNU/Linux

Revision history for this message
Bastanteroma (bastanteroma) wrote :
Revision history for this message
hardyn (arlenn) wrote :

invader,

With your log file actually referencing a specific block, might you have a defective block on the hardisk? maybe perform a surface scan? just a means of reducing noise in the bug report... if your disk scan comes up clean, you might be onto something.

Revision history for this message
Rami Autiomäki (rami-autiomaki) wrote :

I haven't had lockups after I upgraded my bios, started using fixed DSDT and reinstalled almost every kernel, fglrx packages and gdm. This was with Acer Aspire 5022, bios 1.13 --> 1.20. This is day 3 without lockups.

Revision history for this message
brainiac8008 (brainiac8008) wrote :

Bastanteroma,

I think we have something in common. I've only used Hardy a couple of times because I don't want to constantly get the lockups, but when I do, I too get the hard lockups very soon after I log in. I saw in your lspci-wnn.log that you have an NVIDIA GeForce 6200; I have a GeForce 6150 SE. You say that you have a binary driver. Is that the driver from the Restricted Drivers Manager or from the NVIDIA site? I got my driver from the NVIDIA site. Finally, and most important is that you have a zd1211rw usb wireless adapter. I have a zd1211. I don't have the "rw", but it's close enough. Now I don't want to lead the devs and everyone else off of disk I/O, but maybe our usb wireless adapters are causing the problem, or the combination of our wireless adapters and our NVIDIA drivers. I'm going to unplug my wireless adapter and see if I don't get the lockups anymore. However, I won't be able to access the internet, so I'm gonna have to switch back to Vista to report my findings. I won't really be able to do much at all in Hardy without the internet, so maybe I'll just run a bunch of processes and see if it can handle it.

--Noah

Revision history for this message
idyllic (idyllic) wrote :

I just got 7 lockups in the past 4 hours in Hardy. Just do a file transfer between partition or merely reading a PDF file caused the lock up. It is highly frustrating, cause I am in the mid of my examinations. ='( Completely freeze. I couldn't Alt+SysReq + K or R E I S U B. Only way out to is to do a hard reboot cos my laptop fan is making a lots of noise when the lockups occur. I am afraid it might break the hardware. And since I did a hard reboot, there wasn't any logs related to the incident at all ='(

I tried -rt, backported kernel and acpi=off, irqpoll in grub, but it didn't solve the issues for me.

Cheers and thanks,

Revision history for this message
idyllic (idyllic) wrote :
Revision history for this message
idyllic (idyllic) wrote :
Revision history for this message
Invader Amoto (invaderamoto) wrote :

Hey hardyn,
I just tried checking all my disks with fsck (that is what you meant, right?)
and for all my ext3 partitions it said clean. But I decided to try checking the drive itself instead of the partition and for my IDE drive (/dev/sdb) it said

fsck 1.40.8 (13-Mar-2008)
e2fsck 1.40.8 (13-Mar-2008)
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/sdb

The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

And why does it say ext2 for all my ext3 partitions?
Also, if my problem is caused by bad sectors on the drive, then shouldn't ubuntu have some way of telling me this? And why would it make it lock up? I thought ubuntu and linux was designed to let you know what the problem is if one does happen. I guess I could report it as a suggestion.

-Invader Amoto

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Well, after some more crap happened (don't feel like explaining) i fsck'd my IDE drive and it said

fsck 1.40.8 (13-Mar-2008)
e2fsck 1.40.8 (13-Mar-2008)
fsck.ext2: Attempt to read block from filesystem resulted in short read while trying to open /dev/sda1
Could this be a zero-length partition?

which i assume isn't good. I need certain files on that partition. NEED.
I tried opening it and it mounted fine, but browsing it either doesn't work, or is slow and no program can open the files.
Is there some way I can recover those files.

-Invader Amoto

Revision history for this message
Nicholas (drkoljan) wrote :
  • unnamed Edit (1.3 KiB, text/html; charset=ISO-8859-1)

Try using SystemRescueCD

On Sat, May 3, 2008 at 9:51 PM, Invader Amoto <email address hidden>
wrote:

> Well, after some more crap happened (don't feel like explaining) i
> fsck'd my IDE drive and it said
>
> fsck 1.40.8 (13-Mar-2008)
> e2fsck 1.40.8 (13-Mar-2008)
> fsck.ext2: Attempt to read block from filesystem resulted in short read
> while trying to open /dev/sda1
> Could this be a zero-length partition?
>
> which i assume isn't good. I need certain files on that partition. NEED.
> I tried opening it and it mounted fine, but browsing it either doesn't
> work, or is slow and no program can open the files.
> Is there some way I can recover those files.
>
> -Invader Amoto
>
> --
> Linux kernel 2.6.24-12 lockup
> https://bugs.launchpad.net/bugs/204996
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Nicholas (drkoljan) wrote :

Try using SystemRescueCD

Revision history for this message
Wolfgang Glas (wglas) wrote :

Maybe the SABDFL may drop us a short note which counter-measures he will undertake in order to avoid such a debacle in future releases ?

I think such a statement would be very benefitial in order to bring back our confidence in ubuntu...

Revision history for this message
PeteDonnell (pete-donnell-deactivatedaccount) wrote :

I just had a thought: a number of people have suggested that this may be disk I/O related, and some people have apparently been running the live CD without problems. Has anyone tried doing a persistent install on a USB stick, e.g. as on http://t-skariah.blogspot.com/2008/04/ubuntu-804-hardy-heron-on-usb.html ? It might help to verify whether the bug was IDE/SATA related.

Also, would it possible for someone (preferrably officially, but unofficially is better than not at all) to package up an earlier version of the kernel? I know there's been some discussion on this thread as to what the best thing to do is, and I agree that just giving up on the problem and going back to an earlier kernel isn't a viable longterm solution. However, given how difficult to track down this problem is proving and how frustrating it is for those experiencing it, a properly set up package of an earlier kernel (e.g. 2.6.18) as an interim measure would be very helpful. As it is, the computer I'm experiencing the bug on is essentially unusable. This is a real shame, I just installed Hardy on another computer and it runs beautifully, I am very impressed by how well everything's working when this bug doesn't appear. I would offer to build a package this myself, but while I can just about compile and install a custom kernel, I have almost no experience of building packages, nor do I know how to set up a standard "generic" kernel. I suppose it's even possible that such kernel would still experience the same bug, but that in itself would tell us something useful.

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Hey Nicholas,
Thanks for the suggestion.
I tried it, got confused because things were either REALLY slow or it wasn't working in the GUI. So I started up Ubuntu and now my IDE drive works fine. I don't know what's going on but I hope it stays working. Also, I turned off JMicron in my BIOS and updated my BIOS, and Hardy hasn't crashed for a few hours. It could happen any second now, though, so I'm not getting my hopes up like last time I thought i had it fixed.
I think I'm gonna back up an image of my storage partition.

-Invader Amoto

Revision history for this message
brainiac8008 (brainiac8008) wrote :

FOUND A WAY TO REPRODUCE BUG, WIRELESS ADAPTER IS THE PROBLEM

I made an earlier post saying that I will take out my wireless adapter before I start my computer up and see if I still get the hard lockups. I tried it out last night, and sure enough, when Hardy started up without my wireless adapter plugged in, it ran beautifully. For 35 minutes, I ran all sort of different programs, too many to list. I then plugged in my wireless adapter. For a few seconds, nothing happened. Then the network applet in the top right showed that it was identifying and connecting to my wireless internet, and as soon as it showed 4/4 bars for the established connection, Hardy froze completely.

Now I can reproduce the lockups easily, and that explains why Hardy locks up only seconds after I log in when I have my wireless adapter plugged in on startup.

By the way, I used this wireless usb adapter without problems with a different computer, on which I had installed Ubuntu 6.06 and 7.04. So I'm not sure how to proceed from here. Is it a driver issue? Is there a conflict between the wireless adapter and the kernel? Ubuntu is useless to me without an internet connection.

Also, when I got my first lockup, I had just installed Hardy, and I was using it for hours, with internet and all, until I got a (soft?) lockup in which I could move the mouse, but nothing else. The only other thing I could do was Alt-SysRq-R-E-I-S-U-B, and for all (two) of my subsequent boots in Ubuntu, Hardy froze within seconds of my logging in. So for those two times I logged in, and Hardy froze seconds afterwards, it must have to do with my wireless adapter. However, the first time I got the soft lockup, I was using my wireless adapter. What was causing that lockup? I have not used Hardy much since...could that lockup have been caused by a disk I/O issue? Even if I took out my wireless adapter, could a similar soft lockup occur again at seemingly random times, like the lockups all of you are experiencing? Hopefully that lockup had to do with my wireless adapter too. I will look at the syslog for that day I got the soft lockup and see if anything was reported.

--Noah

Revision history for this message
Dylan McCall (dylanmccall) wrote :

I feel I should add, this crash seems to have ceased on my end. I have not had a kernel lockup (or any lockup) since around the time Hardy was released.
Assuming this is a disk i/o problem, I wonder what would happen if someone symlinked the system log files to a network drive?

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Well, I'm back again.
I haven't had a lock up for almost a day. Thats a new record. I guess the problem was having JMicron on in my BIOS, or my BIOS was too old and I updated it. I don't even have raid configured (which is what JMicron is for) so I don't get how that would cause a problem.
Well, it seems we were having multiple problems. I don't even have a wireless adapter to be the cause. So I guess if anyone can pop into their BIOS and turn off JMicron, we could see if that fixes it for some more people.
Or it could be the updated BIOS. I have an Asus m2v-mx motherboard.
I'm gonna go back into BIOS and turn JMicron back on to see if it locks up.

Hey Dylan,
That's a good idea. It should be able to work, though in my experience it locks up on both my hard drives, so it might do the same thing with network drives. And if the program (or whatever) that transfers files over the network isn't already loaded into RAM, then it probably won't work because it would need to read the drive if it isn't. I've seen what it shows in System Log during and after a lockup, but I don't really remember what it said. If u get lock ups like mine where it has a chance of coming back, (but if it needs to read the disk it crashes) you can leave system log viewer open and wait for a lock up. That's what I did. But don't expect it to save those logs the disk.

-Invader Amoto

Revision history for this message
petebass4life (pete-bass4life) wrote :

hmm i uninstalled envyng and the lock ups take significantly longer.

Revision history for this message
rmccabe3701 (robertjmccabe) wrote :

I updated my system yesterday and it seems that the ubuntu team must have solved the lockup issue since I have not had a lockup for over a day ... the problem now is that I have lost all usb functionality. The system does not find any usb device (flash drive, external hard drive, etc.) It doesn't even show up in /dev

when I do

cat /var/log/syslog

I get

...
May 4 21:41:26 rob-desktop kernel: [13067.344518] printk: 1 messages suppressed.
May 4 21:41:26 rob-desktop kernel: [13067.344524] hub 7-0:1.0: connect-debounce failed, port 2 disabled
May 4 21:41:33 rob-desktop kernel: [13074.300333] printk: 2 messages suppressed.
May 4 21:41:33 rob-desktop kernel: [13074.300339] hub 7-0:1.0: connect-debounce failed, port 2 disabled
May 4 21:41:37 rob-desktop kernel: [13078.910989] printk: 1 messages suppressed.
May 4 21:41:37 rob-desktop kernel: [13078.910995] hub 7-0:1.0: connect-debounce failed, port 2 disabled
May 4 21:41:42 rob-desktop kernel: [13083.787938] printk: 1 messages suppressed.
May 4 21:41:42 rob-desktop kernel: [13083.787945] hub 7-0:1.0: connect-debounce failed, port 2 disabled
May 4 21:41:47 rob-desktop kernel: [13088.734658] printk: 1 messages suppressed.
May 4 21:41:47 rob-desktop kernel: [13088.734665] hub 7-0:1.0: connect-debounce failed, port 2 disabled
May 4 21:41:52 rob-desktop kernel: [13093.315552] printk: 1 messages suppressed.
May 4 21:41:52 rob-desktop kernel: [13093.315557] hub 7-0:1.0: connect-debounce failed, port 2 disabled
May 4 21:41:56 rob-desktop kernel: [13098.021377] printk: 1 messages suppressed.
May 4 21:41:56 rob-desktop kernel: [13098.021382] hub 7-0:1.0: connect-debounce failed, port 2 disabled
...

Did the kernel team by change disable usb capabilities as an effort to mitigate the lockup issue?

Revision history for this message
Jim March (1-jim-march) wrote :

Regarding USB, I'm running the newest kernel with USB working.

Either there's a difference between your hardware and mine, OR it relates to the USB "tweaks" I've got running in order to use USB support in VirtualBox.

In case it's the latter, you might try doing those same tweaks turning on the USB File System:

http://www.ubuntu1501.com/2007/12/installing-virtualbox-with-usb-support.html

You don't necessarily need to get VirtualBox working - just do the USB stuff. It won't hurt any and might help.

Revision history for this message
Santiago Zarate (foursixnine) wrote :

Well... i just updated the kernel from a fresh install... installed all i needed to work fine...

I've been working for some time no crashes yet... usb stuff working fine... heavy network work and heavy disk usage (copying movies from a machine to other using NFS) even compiz enabled... now gonna go sleep while playing a movie... lets see if it crashes :p

uname -a output:
Linux santiago-laptop 2.6.24-17-generic #1 SMP Thu May 1 14:31:33 UTC 2008 i686 GNU/Linux

uptime output:
 03:09:59 up 1:05, 3 users, load average: 0.16, 0.78, 0.99

Revision history for this message
Santiago Zarate (foursixnine) wrote :

Nope, locked... :S like 10 mins ago...

Revision history for this message
yeeguy (yeelee) wrote :

Maybe this'll be helpful:

I've got two laptops running Hardy. They're almost identical -- one's an Lenovo Thinkpad T60p and the other is a Lenovo Thinkpad T60 (no "p").

The T60 has been rock solid with Hardy. No freezes, continued up-time ever since upgrade from Gutsy.
The T60p has been freezing at least once 2 or 3 times per day.

I think there are only minor differences between these machines so maybe it'd be an interesting case study because one is totally fine, but the other freezes all the time. I'm pretty new to Ubuntu/Linux, though, so someone tell me what info you need and I'd be happy to post/attach it here.

Revision history for this message
Wolfgang Glas (wglas) wrote :

Please compare your graphics hardware. The 'p' laptops have a greater resolution then the models without the 'p'. Supposedly, the two laptops have a different graphics hardware.

Revision history for this message
wicketr (wicketr) wrote :

Just an update for me. I couldn't even boot up Ubuntu without it locking up on me before it even got to the Logon Screen.

I disabled the JMicron controller and AIPC in the BIOS and now everything is running fine. No lockups yet, though I'm just trying to figure Linux.

Revision history for this message
yeeguy (yeelee) wrote :

re: Lenovo T60p vs. T60

Specs for the T60p that freezes all the time: http://urlenco.de/zlrui
It has a 256MB ATI FireGL V5200 video card, driving a 1440x1050 screen.

And for the T60 that never (yet) freezes: http://urlenco.de/vrkrm
It has a 64MB ATI Radeon X1300 video card, driving a standard XGA screen.

As far as I can tell, the video card and screen resolution is the main distinction between them. I have them both upgraded to 2GB RAM. (Oh, the T60p also has a T2600 CoreDuo vs. T2400 CoreDuo for the T60 -- but I don't think that should make a difference, right?)

Hope that's helpful! Are there any known differences with how Hardy interacts with ATI FireGL (produces freezes) vs. the ATI Radeon (works fine) video cards?

Revision history for this message
Alexander Hunziker (alex-hunziker) wrote :

yeeguy: my Thinkpad is an inbetween of your two configs: it's a T60 with the SXGA+ resolution and a Radeon X1400 driving it. It hardlocks, but quite reproducibly only when the wired network is being used. Can you check if your T60 can also produces hard locks when you use wired ethernet for a few hours (it doesn't happen right away)

Revision history for this message
yeeguy (yeelee) wrote :

Alexander: OK, I've put my T60 on a wired connection. Will report back if the machine freezes. If you don't hear from me, assume that no freeze occurred... (fingers crossed) ;-)

Revision history for this message
Invader Amoto (invaderamoto) wrote :

Anyone with JMicron, turn it off in your BIOS! That completely solved the lock ups for me!
I just tried running it with JMicron back on and it locked up. I forget what I was doing but it was important (updating, and messing with /etc/fstab and /etc/hosts at the same time) and when i did a hard reset, it came back with my / filesystem check failed. So i reinstalled (with JMicron off of course) and everything's working perfectly like it was with JMicron off. It's a simple solution and I hope it works for all of you guys.

-Invader Amoto

Revision history for this message
Øyvind Stegard (oyvindstegard) wrote :

Some words about JMicron and Linux compatibility here:
http://en.wikipedia.org/wiki/JMicron

Revision history for this message
Santiago Zarate (foursixnine) wrote :

Well... ima try compiling a 2.26 soruce and see what happens

Revision history for this message
petebass4life (pete-bass4life) wrote :

Have a look at my syslog there are I/O errors

Revision history for this message
Santiago Zarate (foursixnine) wrote :

Well... i had to compile a 2.6.25.1 kernell... it has been up for some time without crashing... tho i havent been able to build the nvidia driver... anyone can halp? plus... i used the config file from feisty (i still run feisty) and worked really fine...

im hopping this doesnt crashes again

Revision history for this message
Santiago Zarate (foursixnine) wrote :

well 3:29 hrs without lockups... and heavy compilation work (im generating the .deb) plus watching movies :p so... everything is better.., ill upload the kernel to my ppa asap

Revision history for this message
yeeguy (yeelee) wrote :

@Alexander -- Just FYI, my T60 has been running all day with the wired connection, no freezes. So, maybe there's some critical difference between the Radeon X1300 in my T60 vs. the Radeon X1400 in yours that's related to the freezes...

Revision history for this message
rmccabe3701 (robertjmccabe) wrote :

To Jim March:

Yeah your suggestion worked:

http://www.ubuntu1501.com/2007/12/installing-virtualbox-with-usb-
support.html

Now I have USB, and I haven't had a lockup for 4 days :)

So I'm guessing the fix was in the update 2 days ago (am I right?)

Thanks a lot.

Revision history for this message
Alexander Hunziker (alex-hunziker) wrote :

yeeguy: To begin with, does it have exactly the same ethernet controller? lspci tells me "Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller"

Revision history for this message
yeeguy (yeelee) wrote :

@Alexander: yeah, I think so... lspci says the same for the T60p:
Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller

I'm definitely not suggesting that it's the video cards that are the root cause of the Hardy freezes... Just saying that it's interesting and maybe potentially helpful for diagnosis to know that that our nearly identical laptops react very differently to Hardy... It's sort of an experimental control, right?

Anyway, I wonder what the Radeon X1400 and FireGL V5200 have in common (since both the laptops that have those video cards are freezing)? And if those two cards have something in common that is specifically missing/different from the Radeon X1300 then would that help us pinpoint what's going on with Hardy?

Revision history for this message
Jim Spangler (jspangler) wrote :

Having serious issues here about all this too (as are many people from what i've read on the internets). I'm not smart enough to dig too deep, but here's the files asked for.

Revision history for this message
rmccabe3701 (robertjmccabe) wrote :

Update:

I just had another lockup ... this was within hours of enabling the usb with

http://www.ubuntu1501.com/2007/12/installing-virtualbox-with-usb-support.html

So ... maybe the problem is a usb issue? Does anyone else think so?

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi everyone,

It unfortunately seems that this bug report has gotten a bit bloated and hard to follow. There are comments here that this is resolved for some, there are workarounds for others, and then more comments that the issue still exists. It's rather difficult for the kernel team to pick out the relevant pieces of information to help debug the issue.

Note that it's helpful to the kernel team if bug reports target a specific bug against a specific set of hardware. Even though you may be experiencing the same symptom reported here, it will often require fixes in different drivers based on the hardware you are using. Because of this, if you have hardware which differs from the original bug reporter I'd encourage you to please open a new bug report. We can easily mark bugs as duplicates later on if necessary. Also please be sure to include the appropriate debugging information as outlined here: https://wiki.ubuntu.com/KernelTeamBugPolicies . I apologize for any inconvenience this may cause but appreciate your cooperation.

Finally, I'd like to hear feedback from Valkai who is the original bug reporter. Valkai, does this issue still exist for you in kernel released in Hardy final (kernel 2.6.24-16.30). What about the 2.6.24-17 kernel in hardy-proposed? To test the kernel in hardy-proposed, create the file /etc/apt/sources.list.d/hardy-proposed.list to contain the following two lines:

deb http://archive.ubuntu.com/ubuntu/ hardy-proposed main
deb-src http://archive.ubuntu.com/ubuntu/ hardy-proposed main

Then run the command 'sudo apt-get update'. You should then be able to install the linux-image-2.6.24-17 proposed kernel.

If the issue still persists with the 2.6.24-17 kernel, would you be willing to then test the Intrepid Ibex 8.10 kernel which is currently being pulled together and was most recently rebased with the upstream 2.6.25 kernel. It is available for testing at the following PPA: https://edge.launchpad.net/~kernel-ppa/+archive . The steps to test from a PPA are similar to testing from hardy-proposed except create a file /etc/apt/sources.list.d/kernel-ppa.list to include the following two lines:

deb http://ppa.launchpad.net/kernel-ppa/ubuntu hardy main
deb-src http://ppa.launchpad.net/kernel-ppa/ubuntu hardy main

Run 'sudo apt-get update' as you did before. You should then be able to install the linux-image-2.6.25 kernel. Once you've finished testing please feel free to remove the files you created in /etc/apt/source.list.d/ and run 'sudo apt-get update' once more to restore your system to its original state. Please let us know your results. We'd appreciate hearing back from you. Thanks Again.

Revision history for this message
Elod VALKAI (elod) wrote :

Leann:

I'll try the kernels suggested, as well as wired/wireless and no network connection, and see what happens.

Don't expect a complete report til' sunday, though.

Revision history for this message
Janne Moren (jan-moren-gmail) wrote :

I posted bug #223081 - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/223081 - with what seems to be he same problem. And it rather looks like the machine is overheating, with the kernel failing to contain the temperature rise and resulting in the system freezing. That also explains why a reboot directly afterwards is failing, since the system activity at reboot causes the hot machine to overheat again.

Revision history for this message
Elod VALKAI (elod) wrote :

I've run some more tests.

First I've given 2.4.24-16-generic another shot. It crashed after about 40min, with no network load. Temperatures are normal (below 50 Celsius), no usb devices connected. After reboot (I was trying to install the kernel from hardy-proposed) it froze right after I've logged into xfce4.

Second try, with 2.4.24-17-generic from hardy-proposed. I've not compiled madwifi, so I've done the test without any kind of network connectivity. I've started a gnome-terminal with powertop (wakeups are at about 100/sec).
I wanted to stress it a little, so I started a movies (with mplayer). It crashed after about 5min.

Third, with 2.4.25-1-generic, from kernel-ppa, without wifi. I've got bored running the movie, I've launched a browser, terminal, etc. I'll test further (including madwifi), but so far no crash occurred.

Bug does not seem related to network load, or madwifi.

I'm attaching a tar.gz with dmesg, version & lspci from all 3 kernels.

Revision history for this message
Elod VALKAI (elod) wrote :

I've got the kernel versions wrong. They are obviously 2.6, not 2.4.

Revision history for this message
Elod VALKAI (elod) wrote :

2.6.25-1.2ubuntu3-generic IS solid.

Uptime is past 6 hours using epiphany, terminal, man, ssh, youtube.

I've also compiled madwifi from the svn repo (0.9.4 does not compile).

Revision history for this message
pablovp (pablovp86) wrote :

Hi

I have the same problem on an Acer 4402WLMi laptop.

Beside the lockups i've experienced a lot of problems with my broadcom 4318 wireless card.

The real time kernel solved my problems.

Revision history for this message
pablovp (pablovp86) wrote :
Revision history for this message
zity (zdevai) wrote :

I've posted a separate bug on this (Bug #227806).
Short summary:
- 2.6.24-16/17 lock up with acpi enabled (they are stable with acpi=off)
- Intrepid 2.6.25-1 works OK

Revision history for this message
dr.spock (dr.spock) wrote :

I haven't solved the lockup problem with kernel 2.6.25 from kernel-ppa repository. It locks up too, in the same way (no kernel panic, no blinking lights. The only thing it seems to work is using the old 2.6.22 inherited from Gutsy.

I work at a computer shop and I have tested a lot of machines with Hardy. The only conclusion I can extract now is that lockups are only happening with the 386 kernel, not with the AMD64 edition.

Revision history for this message
Janne Moren (jan-moren-gmail) wrote :

Same as for dr.spock - the 2.6.25 kernel still locks up. I have ruled out heat as a cause; I have managed to get 2.6.25 freeze when the machine was fairly cool and under light load.

Revision history for this message
Martin Božič (martin-bozic) wrote :

I'm experiencing this bug too. I have Dell D400 laptop with Intel graphics, audio and wireless. Mostly it happens when browsing the web with FF3 (what is actually 80% of the time). The last time it happened was while scrolling down the Slashdot page.

Another thing which might not be in direct connection with the lockups. When watching Flash videos I experience stuttering about every minute (but not only with Flash). I noticed that at that time the Xorg process jumps to 2 or 3%. Also, touchpad scrolling breaks down after a couple of these stutters.

Finally, when shutting down the laptop, it freezes too. This occurs very often. I also can't use hibernation anymore because of this.

In Gutsy everything worked flawlessly (only bluetooth module after a couple of wakeups from hibernation couldn't wake up too).

I'm attaching my dmesg.

Revision history for this message
Joe (fullmitten) wrote :

Fedora 9 fixes my problem (so far, fingers crossed)
https://bugs.launchpad.net/ubuntu/+bug/227882

Revision history for this message
parmruss (parmruss) wrote :

Interestingly (?), I'm running 2.6.24-11-generic kernel, and did not start seeing these lockups until I ran an update yesterday:
Upgraded the following packages:
cpp (4:4.2.3-1ubuntu4) to 4:4.2.3-1ubuntu5
dbus (1.1.20-1ubuntu1) to 1.1.20-1ubuntu2
dbus-x11 (1.1.20-1ubuntu1) to 1.1.20-1ubuntu2
gcc (4:4.2.3-1ubuntu4) to 4:4.2.3-1ubuntu5
libdbus-1-3 (1.1.20-1ubuntu1) to 1.1.20-1ubuntu2
libgcj-bc (4.2.3-1ubuntu4) to 4.2.3-1ubuntu5
libgcj-common (1:4.2.3-1ubuntu4) to 1:4.2.3-1ubuntu5
openssh-client (1:4.7p1-8ubuntu1.1) to 1:4.7p1-8ubuntu1.2
ssh-askpass-gnome (1:4.7p1-8ubuntu1.1) to 1:4.7p1-8ubuntu1.2
ssl-cert (1.0.14-0ubuntu2) to 1.0.14-0ubuntu2.1
sudo (1.6.9p10-1ubuntu3.1) to 1.6.9p10-1ubuntu3.2
x11-common (1:7.3+10ubuntu10) to 1:7.3+10ubuntu10.1
xbase-clients (1:7.3+10ubuntu10) to 1:7.3+10ubuntu10.1
xorg (1:7.3+10ubuntu10) to 1:7.3+10ubuntu10.1
xserver-xorg (1:7.3+10ubuntu10) to 1:7.3+10ubuntu10.1
xserver-xorg-input-all (1:7.3+10ubuntu10) to 1:7.3+10ubuntu10.1
xserver-xorg-video-all (1:7.3+10ubuntu10) to 1:7.3+10ubuntu10.1
xutils (1:7.3+10ubuntu10) to 1:7.3+10ubuntu10.1

Installed the following packages:
openssl-blacklist (0.1-0ubuntu0.8.04.2)

Revision history for this message
TDB (michael-baranov) wrote :

I experience lockups with Hardy release on BOTH my laptop (Toshiba Satellite p205: atheros, intel 945) and desktop (atheros, nvidia). Both were rock stable in Gutsy and back. The only thing in common is madwifi. When it happens, the mouse, screen etc. is locked up. No screen garbage. I'm experiencing it very randomly but most of the time during heavy network load (both wired/ wireless) or Firefox page scrolling. My uptime ranges from several days to several hours. Very frustrating...

Revision history for this message
andrewEdwards (ae0000) wrote :

We are running 2.6.24-17-server and are experiencing hard lockups (which result in resets) probably 4 or 5 times a day. We foolishly updated an in-use server to hardy (thinking LTS would be stable from the get-go) so are getting quite desperate. We have tried the numerous fixes listed above - all to no-effect. The strange thing is, we have have another machine with the same MB/bios which is being used as a desktop and its fine...........

if there is any extra diagnostic evidence we can provide please let us know. see attachment for dmesg, lspci-vvnn and version

Revision history for this message
Santiago Zarate (foursixnine) wrote :

Well... i solved mine compiling a new kernell (from kernel.org and solved my video issue with a patch from nvidia.com)

I was looking froward to publish this on my ppa... but really i have not much time to do so... if anyone wants the kernells~ and the patched nvidia driver setup msg me... ill upload it on my personal server tho... (I used the old configuration from my feisty install... ) atm... i've siwtched fully to hardy... and no lockups at all...

Revision history for this message
andrewEdwards (ae0000) wrote :

Can I just add to my post above - the server kernal in use (2.6.24-17-server) which was installed this morning is one of the worst we have had over the last couple of weeks. Its resetting itself every 2 hours. Its always under minimal load, as in constant file access etc. (its a svn, samba, music server) but load rarely gets over 0.3 ( 12:49:15 up 7 min, 1 user, load average: 0.00, 0.04, 0.03 )

Revision history for this message
Janne Moren (jan-moren-gmail) wrote :

andrewEdwards: if you upgraded, you should still have the .22 kernel installed from Gutsy. Boot that one and the crashes should be gone.

Revision history for this message
UbunG (geoubun) wrote :

I upgraded also from Gutsy will that help me also with the freeze problem? And how does one find and boot from a previous kernel?

Revision history for this message
andrewEdwards (ae0000) wrote :

Janne Moren: i wiped the disk for a clean install, thanks anyway :)

we will probably go back to gutsy (or etch) if its not fixed in the next couple of days.

Revision history for this message
pablovp (pablovp86) wrote :

I replaced the "hardy" word to "gutsy" in /etc/apt/sources.list and updated apt to make kernel 2.6.22 (the one from gutsy) available.

Then updated "gutsy" to "hardy" and now i can boot on the old kernel, without lockups.

Revision history for this message
TDB (michael-baranov) wrote :

I FOUND A WAY TO 100% REPRODUCE THE CRASH on my laptop:
Toshiba Satellite p205 (atheros+madwifi, intel945+intel, using wicd)
1) connect to any wireless network
2) suspend to RAM
3) resume
4) enter password
5) wait 3-7 sec and observe a lockup.
The lockups differ from time to time: mouse cursor may or may not freeze, HDD led may/may not blink, portions of the GUI may/may not respond to mouse movement for a couple of seconds more.
If you omit step 1 there is never a lockup. Also no lockup at resume/password prompt screen.
2.6.22 kernel does not recover video on resume for me (but used to be OK on real gutsy install).

Revision history for this message
TDB (michael-baranov) wrote :

Just tried proposed 2.6.24-17: both wired and wireless network is gone (no eth0/ath0) but NO LOCKUPS ;-) So at least for me and at least for my laptop the problem is localized.

Revision history for this message
jay3d (jay3dlinux) wrote :

Can u try it with wired network only?

because when i installed the updated "compat-wireless" package from http://linuxwireless.org/

the lockups are gone with those updated drivers ;)

why not include those in ubuntu?

Revision history for this message
tbranham (tbranham) wrote :

OK, I've been keeping quiet for a bit because we have had no changes with my girlfriend's laptop until now. On average, she will go three to five days without a hard-lock, but when she gets them, she will receive two or three hard-locks in succession. This is interesting to me, so I decided to probe a bit further. If her terminal is open she is sometimes able to gracefully reboot, but each character she types takes upwards of 30 seconds to appear on the screen. This last time, however, she was able to perform a 'ps -ef' to see what was running at the time of the crash. I'm including that file with this report. My attention quickly went to the entry for 'trackerd'. I have experienced problems with a sluggish, unresponsive system before when trackerd would run in Gutsy, so I am inclined to wonder if this could be the smoking gun for this system.

In other news, I just received a development box in one of the university labs. The machine had been running Fedora 8 for several months without a problem, but since it is mine, I wanted to start fresh with Hardy. I finished the install yesterday afternoon, and left the machine running all night; when I came in today it was still fine. About an hour ago, however, I experienced a crash. I was able to successfully run an Alt-SysReq-S and an Alt-SysReq-O, but otherwise the system was completely unresponsive. The system is a run-of-the-mill Dell Optiplex 745. I don't know enough about the hardware configuration to give too many details here, but I do know that there is positively no wireless support with this machine. I'm sure that the attached log files should give enough hardware info for a diagnosis...

I will attach all of the necessary files to the next comment. I hope this somewhat helpful. Again, my sympathies are with the developers who are trying to track down this difficult bug.

Thanks,
Travis

Revision history for this message
tbranham (tbranham) wrote :

Here are the log files for my new box (Dell Optiplex 745).

Note: it may be nothing, but there seems to be an interesting entry in kern.log just prior to the crash. Look for the date change in the file.

I hope that helps.
-Travis

Revision history for this message
yaztromo (tromo) wrote :

Firstly I can't be certain whether I've really got this bug or not but, since it doesn't happen every few hours like in some other peoples reports. My lockups happen every 7 to 14 days, which is unusual for this system. So I just want to get my report in since the symptoms sounds similar to others and it may help the devs find the problem.

I built a server for my place of work a long time ago that ran feisty. It comes under heavy disk and network load regularly but has always been stable. I'd always planned to upgrade it to Hardy when it came out, liking the idea of it being an LTS.

Since the upgrade to Hardy I've had several crashes over a month period. The first two were during an rsync backup to an external USB harddrive. The last one occured today under heavy load: a network rsync session in full flow, a hylafax fax reception, folding@home running, a member of staff printing remotely via cups and myself accessing apache all pretty much simultaneously. As I said though in the past this wouldn't have tripped the little box up. Symptoms are what others describe: blank screen, no keyboard access, everything dead.

On all three occasions there has been nothing in any log to suggest the kernel had anytime to log an error. However I have noticed though that since fiesty the driver used for my SIS 5513 IDE controller has changed.

Finally this machine does not run X, it is headless.

Hardware:
ASUS A7S-VM with built in LAN, sound (disabled), and Video
Athlon 2000XP
2 x 512MB SDRAM

Revision history for this message
yaztromo (tromo) wrote :
Revision history for this message
yaztromo (tromo) wrote :
Revision history for this message
andrewEdwards (ae0000) wrote :

Last night in desperation we installed 2.6.25-1-server from deb http://ppa.launchpad.net/kernel-ppa/ubuntu hardy main

hopeful, i just checked the uptime on the server this morning:
Linux highway61 2.6.25-1-server #1 SMP Sat May 17 10:09:18 UTC 2008 x86_64 GNU/Linux
06:59:52 up 53 min, 1 user, load average: 0.00, 0.00, 0.00

obviously it did not work.
so I dont know where to go anymore - from all I had read that kernal was meant to be solid :(

even though its a headless server, it does have a video card... so i might try removing that and see how many hours we can get.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Elod,

Thanks for testing and the feedback. It seems this issue is resolved for you with the upcoming 2.6.25 Intrepid kernel. Since you are the original bug reporter, I'm going to mark this bug as Fix Released against Intrepid. It would be good to also try to isolate the patch which could possibly be backported to Hardy. However, this may prove difficult for the developers to isolate since they have differing hardware. How comfortable would you be at performing a git-bisect? It's obviously not something we expect you to do but I could try to walk you through the appropriate steps.

For anyone else still experiencing issues, refer to comment: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/192 and please open a new report. Thanks.

Changed in linux:
status: Triaged → Fix Released
Revision history for this message
KevDog (kev-hilton) wrote :

Having the same lockup problem on Acer laptop. Attempted to upgrade to 2.6.25.1 kernel as described -- original installation had LVM-LUCK encryption. Once attempting to boot into the new kernel, the password for the whole disk encryption would no longer work. Had to revert back to hardy kernel -- and once again having lockup problems that occur with either FIrefox or Pidgin starting -- whole system locks -- need hard reboot. Any further information you can point me too or any suggestions?

Revision history for this message
brodiepearce (brodiepearce) wrote :

I'm getting these random freezes/lockups with the following hardware:

AMD A64 3200+
DFI motherboard (NVidia NF4 chipset)
SATA + PATA IDE and ATAPI devices
1GB RAM
ATI graphics card
RT61 based wireless NIC

I have wireless disabled at the moment to try and see if that could be the cause of my crashes specifically, although it's painful because my ethernet has suddenly decided to kick the bucket for some reason.

Here are my logs (as per https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/88):

http://brodiepearce.googlepages.com/interrupts.log
http://brodiepearce.googlepages.com/dmidecode.log
http://brodiepearce.googlepages.com/mtrr.log
http://brodiepearce.googlepages.com/Xorg.0.log
http://brodiepearce.googlepages.com/kern.log

I don't have a kern.0.log logile... hope this helps in some way :/

Revision history for this message
Sandair (friggincomputers) wrote :

I have the exact same problem as the orinal bug report. Same machine: Dell C400, 512MB. Different wireless card. I have 2.6.24-17.

It freezes randomly and completely. Even the network lights stop responding. And there is absolutely nothing in the logs. I tend to think it is related to the network because when I don't use the network interface (either wireless or wired) it takes much longer to freeze.

Apparently kernel 2.6.25 solved the problem, but I can't get it from ppa anymore (they're on 2.6.26 for intrepid now).

Is there a fix in the pipeline that will be distributed with the updates? Or should I update my kernel manually?

Revision history for this message
brodiepearce (brodiepearce) wrote :

I enabled the backports and proposed repositories on this machine and
updated from those shortly after my previous post, that seems to have
fixed my problems for now.

Sandair wrote:
> I have the exact same problem as the orinal bug report. Same machine:
> Dell C400, 512MB. Different wireless card. I have 2.6.24-17.
>
> It freezes randomly and completely. Even the network lights stop
> responding. And there is absolutely nothing in the logs. I tend to
> think it is related to the network because when I don't use the network
> interface (either wireless or wired) it takes much longer to freeze.
>
> Apparently kernel 2.6.25 solved the problem, but I can't get it from ppa
> anymore (they're on 2.6.26 for intrepid now).
>
> Is there a fix in the pipeline that will be distributed with the
> updates? Or should I update my kernel manually?
>
>

Revision history for this message
Sergio Callegari (callegar) wrote :

This bug is marked as fix-released. But the current 2.6.24-18 kernel does not fix it.

Apparently it is 2.6.25 that fixes the bug (hopefully). So the fix is released only for the still not existing intrepid release.

Please do actually release the fix for hardy users.

Revision history for this message
Martin Božič (martin-bozic) wrote :

I hope I'm not too annoying, but I have agree with Sergio. If this was a regular release I wouldn't mind, but since this is an LTS release, I do. I was really looking forward to recommend (and support) it to many friends, relatives and colleagues because of that, but since this unfortunate experience I have it installed only on my laptop which gives me the headaches and one LTSP server which seems to be fine. Another computer on which I tried to install Mythbuntu had to be rollbacked to Gutsy. Thus other machines I support are being put on hold. I can't imagine I would have to support even more computers with a problem like that.

Is there anything that can be done to get this fix released for 8.04.1? There were no results mentioned from git-bisect that was proposed to Elod. I have no idea how to perform git-bisect, but I volunteer to do it. Anything to get rid of this pest.

Revision history for this message
UbunG (geoubun) wrote :

I've been waiting for a fix to my lock up problems ever since I upgraded to Hardy. I experience lockups about 3 to 4 times a day and it's very frustrating. I've been patiently waiting for an update or solution, but no one seems to have one. Are the people who help fix Ubuntu's bugs still working on a solution or is there already a solution I don't know about?

Revision history for this message
Janne Moren (jan-moren-gmail) wrote :

I have an older version of my current machine; it's similar enough that drivers are mostly the same. I did a clean install of Hardy on that one, and there's no lockups on it so far. This could possible be related to something happening or being left over from upgrading the system, rather than reinstalling?

Revision history for this message
Martin Božič (martin-bozic) wrote :

I have a clean Hardy install on my laptop from the very beginning.

Dne 13.06.2008 (pet) ob 23:23 +0000 je Janne Moren zapisal(a):

> I have an older version of my current machine; it's similar enough that
> drivers are mostly the same. I did a clean install of Hardy on that one,
> and there's no lockups on it so far. This could possible be related to
> something happening or being left over from upgrading the system, rather
> than reinstalling?
>

Revision history for this message
brodiepearce (brodiepearce) wrote :

As do I, my problems seem to be stemming from the included RT61 wireless
driver also. This system has been stable for the past two weeks using
an old wired ethernet card. If I replace the wireless card in the
system it locks up every time within one or two hours of heavy network load.

Martin wrote:
> I have a clean Hardy install on my laptop from the very beginning.
>
> Dne 13.06.2008 (pet) ob 23:23 +0000 je Janne Moren zapisal(a):
>
>
>> I have an older version of my current machine; it's similar enough that
>> drivers are mostly the same. I did a clean install of Hardy on that one,
>> and there's no lockups on it so far. This could possible be related to
>> something happening or being left over from upgrading the system, rather
>> than reinstalling?
>>
>>
>
>
> ** Attachment added: "unnamed"
> http://launchpadlibrarian.net/15303815/unnamed
>
>

Revision history for this message
Chainz (chainzee) wrote :

Have same problem on two machines (totally different architectures). Kernel 2.6.24-18
System and keyboard go freeze (numlock stays ON and can't be turned off), but still you can move mouse over your screen, but that's all.

Haven't observed any CPU load when it happens.

Happens randomly, except of one thing: it always hangs that way with some screen savers.
Immediately, either when you just select them and see them in the preview or after screen saver starts after some idle time.
See: http://ubuntuforums.org/showthread.php?t=828271

If you need some more details just please tell me and will provide them straightaway!

Revision history for this message
TDB (michael-baranov) wrote :

Since I stopped using my Atheros wireless (it's not disabled, just not connected to any network) and started to use wired NIC, I never ever had any more lockups.

Revision history for this message
Martin Božič (martin-bozic) wrote :

That's unfortunately not true for my Intel Corporation PRO/Wireless LAN
2100 3B Mini PCI Adapter (rev 04). I get less lockups though and no
kernel panics so far (my NIC is Intel Corporation PRO/Wireless LAN 2100
3B Mini PCI Adapter (rev 04)).

Dne 16.06.2008 (pon) ob 08:28 +0000 je TDB zapisal(a):

> Since I stopped using my Atheros wireless (it's not disabled, just not
> connected to any network) and started to use wired NIC, I never ever had
> any more lockups.
>

Revision history for this message
KevDog (kev-hilton) wrote :

As per the ubuntu forums:

sudo apt-get remove powernowd

This helps eliminate the frequency of the lockups-however does not eliminate them.

Revision history for this message
Arthur Schiwon (blizzz) wrote :

I installed Hardy on my machine at work and soon experienced such lock ups. I did the first install via Wubi, so i thought the lockups had something to do with harddisk write access. After the Harddisk crashed i performed a fresh install on own partitions ─ the lockups went on. I tried other kernels (386, rt...) and also the last 22 kernel from Gutsy. Everythin without success. Now, i am running with 25.1 Kernel from Intrepid, except one lockup on the first day it runs flawlessly.

I have no wireless card in here, but an AMD Athlon XP 3000+ processor and a nVidia Ethernet Controller.

Revision history for this message
Chainz (chainzee) wrote :

It must be something with graphics and most probably OpenGL.
First Screensaver... And now I found another way to reproduce the error:
I just start Blender and try to work in it, after about 30 seconds it hangs :'(

At the moment I have kernel: 2.6.24-19

Revision history for this message
UbunG (geoubun) wrote :

I have kernel 2.6.24-19 also and this bug continues to be terrible. I get lockups and freezes even if I'm not doing anything on my laptop. I can step away and come back and my screen is frozen as well as my harddrive. The harddrive activity indicator always remains "stuck" lit during a freeze moment. Even while shutting my computer down, it will freeze 99% of the time. And it's frustrating when typing an email message, a freeze moment occurs. Thankfully, email services like Google and Yahoo have an auto save protection feature during composing. If not, composed email messages would be lost forever!

Is there any other way, than going back to Gutsy, I can fix these lock/freeze issues? I don't want to have to wipe my hardrive clean and reinstall all the programs I installed, especially since the only issue is the "lock/freeze" problem.
Is there an updated kernel ready for installation or is there one in the making?

Revision history for this message
brodiepearce (brodiepearce) wrote :

I think it's now well-established that the lockups/freezes in Hardy are
being produced by a wide variety of situations. The only times I get
lockups without using RT61 wireless are during gameplay with WINE.

Chainz wrote:
> It must be something with graphics and most probably OpenGL.
> First Screensaver... And now I found another way to reproduce the error:
> I just start Blender and try to work in it, after about 30 seconds it hangs :'(
>
> At the moment I have kernel: 2.6.24-19
>
>

Revision history for this message
Alex Borisov (yumekage+launchpad) wrote :

I'm having the same problem on a Sony PCG-Z600LEK. I cant even boot. I seem to recall being able to use 7.10 fine, but the message was always flooding the message log (and using the tty was impossible). I cannot install 8.4 - it hangs on the boot screen for a while, then drops to tty and floods it with debounce messages.

Im having the same issue with Mint (naturally since it's ubuntu based). I also have the problem with the latest Debian (although i installed just fine, but the system went down after an update - so it's probably kernel related). I'm going to try non debian distros like Fedora but i expect the same. I will also try using older versions although this bug is reported across multiple kernels and has cropped up in the past.

Im also getting:

usb 1-1.3: device not accepting address xx, error -71

and

device descriptor read/64, error -71.

I have no bluetooth in this laptop, have no usb devices attached and i have even tried opening it up and manually disconnecting the usb ports from the motherboard (so it's controller related) and the issue still remains. All the stuff about legacy etc... is useless as are all the kernel parameters. As of right now this laptop is a paperweight until this issue is resolved.

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

Alex - Your issue doesn't even have the same symptom as those reported here, so unless you report a separate bug report, your issue will not get resolved.

As per https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/192, will people who are still experiencing this problem and have different hardware to the original reporter please open a new bug report. It's unlikely that your problem will get fixed by commenting on this one. With all the different permutations of hardware, workarounds and scenarios here, this bug report is virtually impossible to follow.

Revision history for this message
Sandair (friggincomputers) wrote :

Update for any kernel dev following this thread. I have the same hardware as the original poster (Dell C400, 512MB), possibly with a different wireless card (but the freeze happens even with a wired connection). And 2.6.24-19 still does not fix it.

Revision history for this message
Supersaiyan_IV (saiyan-iv) wrote :

Hi!
Can you help me testing out this (ugly fix?). Out of pure curiosity I did:
echo 1 > /proc/sys/vm/block_dump
but while trying to 'trace' the bug with dmesg (assuming it's I/O related) I couldn't find anything. WHY? Because my Ubuntu Hardy stopped freezing! My theory is that this command works like a 'keep-alive', thus the drive has no room to 'time-out'. Since there is no way to explain this accurately, can somebody try this and confirm? When you're done revert with: echo 0 > /proc/sys/vm/block_dump. Do dmesg to check the output. Not a fix, but always something.

Revision history for this message
Supersaiyan_IV (saiyan-iv) wrote :

Ignore echo 1 > /proc/sys/vm/block_dump, this only delayed the freeze. Bug remains.

Revision history for this message
Supersaiyan_IV (saiyan-iv) wrote :

Managed to get some output with block_dump enabled. Very interesting. Most interesting excerpts here, complete one attached. Note, the mouse freeze is least relevant here.

[ 9499.124930] psmouse.c: GlidePoint at isa0060/serio1/input0 lost synchronization, throwing 1 bytes away.
[ 9500.058337] psmouse.c: resync failed, issuing reconnect request
[ 9500.532960] udevd(18112): dirtied inode 3361284 (\x2fdevices\x2fvirtual\x2finput\x2finput11\x2fmouse1) on tmpfs
[ 9500.542415] udevd(18113): dirtied inode 3361296 (\x2fdevices\x2fvirtual\x2finput\x2finput11\x2fevent3) on tmpfs
[ 9500.561211] udevd(18115): dirtied inode 3361309 (\x2fdevices\x2fplatform\x2fi8042\x2fserio1\x2finput\x2finput12\x2fmouse2) on tmpfs
[ 9500.574371] udevd(18116): dirtied inode 3361319 (\x2fdevices\x2fplatform\x2fi8042\x2fserio1\x2finput\x2finput12\x2fevent4) on tmpfs
[ 9501.181937] input: PS/2 Mouse as /devices/virtual/input/input13
[ 9501.230852] input: AlpsPS/2 ALPS GlidePoint as /devices/platform/i8042/serio1/input/input14
[ 4747.055221] udevd(18127): dirtied inode 9272 (subsystem) on sysfs )
[ 4747.074221] udevd(18123): dirtied inode 3417830 (mouse1) on tmpfs
[ 4747.074561] hald(5224): dirtied inode 9298 (subsystem) on sysfs
[ 4747.086035] udevd(18127): dirtied inode 3417837 (event3) on tmpfs
[ 4747.086323] hald(5224): dirtied inode 9306 (subsystem) on sysfs
[ 4747.110600] hald(5224): dirtied inode 9087 (driver) on sysfs <--- (most interesting, hence "driver")
[ 4747.116036] path_id(18136): dirtied inode 9314 (subsystem) on sysfs
[ 4747.124292] udevd(18135): dirtied inode 3417891 (mouse2) on tmpfs
[ 4747.124696] hald(5224): dirtied inode 9341 (subsystem) on sysfs
[ 4747.125481] udevd(18138): dirtied inode 3417900 (event4) on tmpfs
[ 4747.125887] hald(5224): dirtied inode 10503 (subsystem) on sysfs

I hope this helps a bit.

Revision history for this message
Supersaiyan_IV (saiyan-iv) wrote :

2.6.24-19-generic

Revision history for this message
Supersaiyan_IV (saiyan-iv) wrote :

Sorry for posting so many times in a row.

But this time I have solved the freezes, in my case at least. What I did was to create a redistributable & bootable backup dvd with remastersys in case I had to reinstall. Since I gutted the Hardy installation from trying to fix the bug I used the backup for the first time. Only later I realized there were no lockups. It has been stable for 1½ days now. I'll report further some week from now. Present dmesg attached. Still 2.6.24-19-generic.

Revision history for this message
Luís Silva (luis) wrote :

Does anyone here has backport-modules installed? I was having a hard freeze (with no output, just the complete freeze) with disapeared after uninstalling backport modules. My hardware is a core2duo laptop with ati graphics (X1600) and intel iwl3945 wireless card. When someone sugested that the crash could be related to wireless networking I remembered that hardy default driver was iwl3945 1.2.0 and backport modules provided a more recent one... I notice an obvious difference in performace between the to driver versions, but the crashes seem to go away.

backport-modules include mac80211 and iwlwifi new versions, as well as other drivers, so if the problem was in mac80211 this could be the reason why so many different hardware configurations are affected right?

The freeze started around beta time (a little bit before, as I can recall) and remained till I removed backport-modules the other day. Now i'm with kernel 2.6.24-19-generic

Revision history for this message
wackyiniraqi (wackyiniraqi) wrote :

same problems here...but only on my desktop...laptop working good. Using wireless exclusively. other logs to follow

Revision history for this message
wackyiniraqi (wackyiniraqi) wrote :
Revision history for this message
wackyiniraqi (wackyiniraqi) wrote :
Revision history for this message
wackyiniraqi (wackyiniraqi) wrote :

some people on other forums have suggested that this is a compiz lock-up. I do not believe that to be the case. As with the compiz lock-up the mouse still works and people can ssh. with this lockup i'm getting I have to hit reset or the power button on my tower. Sometimes I get this lockup at the login screen and the login sound loops forever...it's crazy. Can't be compiz at the login screen.

Revision history for this message
Luís Silva (luis) wrote :

False alarm, uninstalling backport modules didn't solved the problem...

The laptop stood up for 10 days without locking up, but today did it again... :(

This problem is driving me mad... I'm starting to think the only possible cause for this problem is hardware failure, but this doesn't fit as it started around one of the alphas of hardy and several other pelople complainted of the same issue...

How can I help debug the problem, can the dev's give some directions? This is a serious issue...

Revision history for this message
Wolfgang Glas (wglas) wrote :

OK, after all this mess, I skimmed through the 2.6.25 changelogs looking for possible deadlock fixes.

The first commit that seems to be conistent with my own experience is:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b000cd3707e7b25d76745f9c0e261c23d21fa578

This is a problem in certain unusual states of the TCP/IP-stack. Since many users experience the deadlock with wireless and/or dialup connections, this might be the right one to look at, since sequence holes are something that might typically occur on such connections.

Can a developer please tell me, whether this patch is already applied to 2.6.24-19 ? If no, would it be possible to provide a testing kernl with this patch applied?

  TIA,

    Wolfgang

Revision history for this message
Martin Božič (martin-bozic) wrote :

I can say that I don't have kernel panics with 2.6.24-19, only those hiccups
that last for about 10 seconds.

2008/7/14 Wolfgang Glas <email address hidden>:

> OK, after all this mess, I skimmed through the 2.6.25 changelogs looking
> for possible deadlock fixes.
>
> The first commit that seems to be conistent with my own experience is:
>
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b000cd3707e7b25d76745f9c0e261c23d21fa578
>
> This is a problem in certain unusual states of the TCP/IP-stack. Since
> many users experience the deadlock with wireless and/or dialup
> connections, this might be the right one to look at, since sequence
> holes are something that might typically occur on such connections.
>
> Can a developer please tell me, whether this patch is already applied to
> 2.6.24-19 ? If no, would it be possible to provide a testing kernl with
> this patch applied?
>
> TIA,
>
> Wolfgang
>
> --
> Linux kernel 2.6.24-12 lockup
> https://bugs.launchpad.net/bugs/204996
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Martin Božič
Žapuže 4
5270 Ajdovščina

Revision history for this message
Wolfgang Glas (wglas) wrote :

However, a lot of people experience the described system deadlock. May there be a developer, who comments about the above mentioned kernel patch

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b000cd3707e7b25d76745f9c0e261c23d21fa578

TIA,
   Wolfgang

Revision history for this message
inquinador (pjpmendes) wrote :

I was having these regular freezes, sometimes 3 or 4 time in a row and had to force reboot every time.
Later on i noticed another freeze (this one reproducible) while connecting my WM6 PPC to DUN --activesync... that was already reported.
The following thought came to mind: "What if they are somehow related?"
After disconnecting my bluetooth adapter the random freezes completely disappeared, one week without any freezes...

Revision history for this message
wackyiniraqi (wackyiniraqi) wrote :

I thought this might be a video card problem. I removed my Nvidia 6200
and put in a GeForce3 Ti200. The problem still persists...I tried
recovery mode last night and the comp froze while loading everything I
had to hard reset the computer about 12 times before I could finally get
it to load so that I could grab my data and move to the laptop.

Has anyone disabled their wireless card and seen if that fixes the
problem? I'm using an Atheros card.

inquinador wrote:
> I was having these regular freezes, sometimes 3 or 4 time in a row and had to force reboot every time.
> Later on i noticed another freeze (this one reproducible) while connecting my WM6 PPC to DUN --activesync... that was already reported.
> The following thought came to mind: "What if they are somehow related?"
> After disconnecting my bluetooth adapter the random freezes completely disappeared, one week without any freezes...
>
>

Revision history for this message
wackyiniraqi (wackyiniraqi) wrote :

I have since removed my Atheros wireless card and installed a broadcom card.....problem still persists....seems the desktop is gonna have to use winDOZE for now

Revision history for this message
Elod VALKAI (elod) wrote :

wackyiniraqi wrote:

> Has anyone disabled their wireless card and seen if that fixes the
> problem? I'm using an Atheros card.

I've just removed the madwifi drivers. With no LAN or WLAN, I've managed
to freeze the laptop.

It's not related to HW failures, it's a really strange Ubuntu policy,
that drives an unstable kernel right into an LTS distribution.

I guess a 6 month release cycle is just not feasible.

This kernel should be upgraded to 2.6.26 or downgraded to 2.6.22, which
is the last, proven stable kernel.

--
Valkai Elõd

Revision history for this message
Wolfgang Glas (wglas) wrote :

Just to put it right, 2.6.18 (debian etch) is the last kerlenl, which survives on our mildly loaded servers more than a few hours. All kernel starting from 2.6.20 have locked up on our servers.

And yes, it's shame that such a kernel has made it into a LTS. Even more bothering is the way, developers cope with this problem: No reaction :(

  Regards,

    Wolfgang

Revision history for this message
Sandair (friggincomputers) wrote :

I don't think they were aware of the problem when they released it and I am sure they are trying to fix it. But this is the toughest type of bug to fix, especially if you don't have a machine that reproduces.

Another problem is that this thread is somewhat closed because someone set it as "fix released" and they may not be looking at it much. That is why I opened another identical bug, but there is not much activity there, so they don't look at that one either. You should comment there. It is bug 243561.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/243561

Revision history for this message
Sandair (friggincomputers) wrote :

I just realized it is tagged for release in Intrepid, that is why they are ignoring this thread. They consider 2.6.25+ to be the solution and it will be released with Intrepid, but Hardy is LTS!

Revision history for this message
Elod VALKAI (elod) wrote :

Sandair wrote:

> Another problem is that this thread is somewhat closed because someone
> set it as "fix released" and they may not be looking at it much.

It's "fixed" because I've got a stable system with 2.6.25 from ...
proposed-upgrades I guess. It's considered fixed on the Dell C400, that
I originally reported it on.

An Ubuntu rep. (Leann if I remember correctly) proposed that all
affected open a separate bug-report, because it's easier to track bugs
that way.

I was also asked if I'd do a git-bisect. Until now I did not find the
time to do it, but someone will have to, in order to narrow it down somehow.

--
Valkai Elõd

Revision history for this message
Elod VALKAI (elod) wrote :

Sandair wrote:
> I just realized it is tagged for release in Intrepid, that is why they
> are ignoring this thread. They consider 2.6.25+ to be the solution and
> it will be released with Intrepid, but Hardy is LTS!

Well... we've got the buggiest LTS so far (and it's only the 2nd in line).

Surely, this needs to be fixed in Hardy, as it's here to stay for at
least a couple of years.

I'm also not really convinced that 2.6.25 fixes all problems, as others
have reported similar behaviour in that release as well. If I remember
correctly Wolfgang is one of them.

Sorry guys, I've been ignoring this thread, but I'm back :)

--
Valkai Elõd

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

Elod is correct. It is marked as Fix Released for Intrepid because he reports that the crash doesn't occur for kernel version > 2.6.25 on his machine. As he is the original reporter, then this means that his bug is fixed in Intrepid. It won't be fixed for Hardy unless we can determine exactly what commit fixed the problem which caused Elod's lock-up.

Other people experiencing lock-ups should open their own individual bug reports containing all the relevant information as per the kernel team bug policy: https://wiki.ubuntu.com/KernelTeamBugPolicies unless they are absolutely sure that their lock-up is caused by the same problem which caused the lock-ups for Elod. You should contain a brief description of your hardware in the bug title (not just kernel 2.6.24 lockup or whatever, otherwise you'll end up with another bug report that is hijacked by lots of other people).

Remember, just because you're all experiencing the same symptom doesn't mean that it is the same bug. A lock-up or crash is a fairly generic symptom. Your lock-ups could all have different causes, and without doing detailed debugging, it is not possible to determine what the cause is. Adding "me too's" here just confuses things, makes the bug report difficult to follow and makes it less and less likely that your problem will get fixed.

Multiple bug reports specific to a single hardware set-up are much easier to follow than one large bug report with umpteen permutations of hardware configuration.

Revision history for this message
KevDog (kev-hilton) wrote :

Note to others -- Ive tried the Ibex Kernel and the lockup issue continues. So unless a different kernel version is going to be released the new kernel does not seem to solve the problem for me!

Revision history for this message
Chainz (chainzee) wrote :

I have no wireless and having the lockups - In my opinion it's graphics
related problem.

2008/7/16 Elod VALKAI <email address hidden>:

> wackyiniraqi wrote:
>
> > Has anyone disabled their wireless card and seen if that fixes the
> > problem? I'm using an Atheros card.
>
> I've just removed the madwifi drivers. With no LAN or WLAN, I've managed
> to freeze the laptop.
>
> It's not related to HW failures, it's a really strange Ubuntu policy,
> that drives an unstable kernel right into an LTS distribution.
>
> I guess a 6 month release cycle is just not feasible.
>
> This kernel should be upgraded to 2.6.26 or downgraded to 2.6.22, which
> is the last, proven stable kernel.
>
> --
> Valkai Elõd
>
> --
> Linux kernel 2.6.24-12 lockup
> https://bugs.launchpad.net/bugs/204996
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
>
> Bug description:
> Binary package hint: linux-image-2.6.24-12-generic
>
> I was upgrading from gutsy (with Linux 2.6.22-14) to the latest alpha last
> sunday (16.03.2008), and I've got some problems with the kernel.
>
> The 2.6.24-12-generic (I think, may be -386) causes my machine to lock up
> (hard) after about 5 minutes. Generally I was under X, there is no specific
> program I was using at the time it locked up.
>
> The hardware is completely stable & has been for the last 3 years with
> ubuntu, and still is with hardy & the gutsy kernel.
>
> Another thing I noticed is that something in the initrd keeps the machine
> from booting for at least 2-3 minutes. It's definetely before running
> scripts in the /etc/rcS.d folder, I have not traced it in the initrd. After
> booting all is well until it locks up.
>
> The hardware is a Dell C400, Intel chipset, Intel graphics (i830), 384MB of
> RAM and an atheros wireless card.
>
> lsb_release:
> Description: Ubuntu hardy (development branch)
> Release: 8.04
>
> It's completely reproductible, the freeze takes out the whole kernel (no
> reply to ping from network). Any suggestions on how to trace it? I have a
> serial port, no parallel.
>

Revision history for this message
Ed Arnold (era-ucar) wrote :

On Wed, 16 Jul 2008, Wolfgang Glas wrote:

> Just to put it right, 2.6.18 (debian etch) is the last kerlenl, which
> survives on our mildly loaded servers more than a few hours. All kernel
> starting from 2.6.20 have locked up on our servers.
>
> And yes, it's shame that such a kernel has made it into a LTS. Even more
> bothering is the way, developers cope with this problem: No reaction :(

Which is exactly why, when I reinstall some version of linux on my
laptop, I'm going to go with a release from Redhat.

Revision history for this message
Yotam Medini (yotam-medini-gmail) wrote :

Using 2.6.22-14-generic, I suffer from similar 'random' freeze. This seems to happen mostly (always?) when FireFox 3.0b5 is running.

I did follow the suggestions of:
Excessive disk I/O in Firefox 3b5
http://ubuntuforums.org/showpost.php?p=4770985postcount=50)

Attached is ukbr.tar.gz having:
# tar tvzf ukbr.tar.gz
-rw-rw-rw- root/root 124784 2008-07-18 10:33 dmesg.log
-rw-rw-rw- root/root 8479 2008-07-18 10:33 lspci-vvnn.log
-rw-rw-rw- root/root 82 2008-07-18 10:33 uname-a.log
-rw-rw-rw- root/root 28 2008-07-18 10:33 version.log

Revision history for this message
Sergio Callegari (callegar) wrote :

This bug was originally reported for hardy.
Indeed it was reported on March 22 when hardy was not even shipped yet. At that time development on intrepid was not even open.

How does it come that it is now tracked as an intrepid bug?
The fact that there is a fix available for intrepid does not mean that there is a fix available for hardy. In fact there is not.

Please, track this bug as an hardy bug and leave it open as it should. Do not treat this as an intrepid bug that appears to be solved: it is a trick to do so and it is just confusing for users looking for fixes.

Please allow everybody to easily see that there is a very critical open bug for hardy for which _no_ fix is available (yet) as fixes will need to be identified and backported from the intrepid kernel (admitting that it is possible and practical to do so).

Revision history for this message
Tom M. (wvm7fk202-deactivatedaccount) wrote :

Thank goodness I finally found this bug! It has been frustrating trying to find out what the heck is causing this.
I took out my wireless card (Atheros chipset - Netgear WG311T) and I'm just using my wired ethernet for the time being to see if the bug is triggered.

Revision history for this message
Sergio Callegari (callegar) wrote :

Your wireless card may _help_ triggering the bug, but unfortunately it is not the source of the bug.
I am experiencing the issue on a system with no wireless at all.

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

sergio - It could very well be the source of Tom's problem but there is no way of knowing with such little information. Just because you see lock-ups without wireless doesn't mean that other peoples lock-ups can't be caused by their wireless. As already pointed out many times above, the people posting to this bug report possibly have completely different bugs, all with the same symptom. Just because you all see lock-ups does not mean they are being caused by exactly the same bug.

There are so many different combinations of hardware in this bug report now that it has become virtually impossible to follow. This bug is fixed in Intrepid for the original reporter (hence why it is marked as being tracked in Intrepid). It isn't fixed in Hardy because it is very difficult to establish exactly what commit fixes the bug for the original reporter.

Revision history for this message
Sergio Callegari (callegar) wrote :
Download full text (3.5 KiB)

Chris Coulson wrote:
> sergio - It could very well be the source of Tom's problem but there is
> no way of knowing with such little information.
Chris, you are right. Indeed it could. But it wouldn't be good news. It
would mean that all of a sudden, after having had many kernels
performing stably on most hardware we get into a 2.6.24 having many
different bugs in many different subsystems, /all/ resulting in hard
lockups rather than in more disciplinated kernel oopses.

Furthermore, bugs in drivers have typically been rapidly spotted and
fixed, since driver code is generally modularised and finding the bug
location is generally as easy as preventing the loading of some modules.
Conversely, the pattern that I am seeing here by reading most of the
messages in the various threads is the following:
1) I have the freeze issue
2) I have tried this and this (typically removing some driver or even
some userland software) and I do not have it anymore, eureka: it is
<pick something among Nvidia, Ati, a wireless driver, even Firefox 3, etc>
3) oops sorry, instead of freezing after 30' my system did after 6 hours.
> Just because you see
> lock-ups without wireless doesn't mean that other peoples lock-ups can't
> be caused by their wireless.
Yes, but due to the above, we need to kindly urge bug reporters to open
/new/ bugs if they believe that they have identified a particular
subsystem causing the bug.
Surely, I did that in the wrong way and I apologize.
Even more important we need to make sure that bug reporters do not post
1) and 2) above, forgetting to post 3) in case the bug still manifests.

Or otherwise we can open a new bug "Linux kernel 2.6.24 lockup, do not
post here unless you are experiencing the bug even without ATI, NVIDIA,
wireless and SATA"
> As already pointed out many times above,
> the people posting to this bug report possibly have completely different
> bugs, all with the same symptom. Just because you all see lock-ups does
> not mean they are being caused by exactly the same bug.
>
> There are so many different combinations of hardware in this bug report
> now that it has become virtually impossible to follow. This bug is fixed
> in Intrepid for the original reporter (hence why it is marked as being
> tracked in Intrepid).
Sorry, I still believe that this is very wrong. If the bug, that was
originally opened for hardy, is fixed in intrepid for the original
reporter, then there should be an open hardy tracked bug with the notice
that it is fixed in intrepid, not a fix-released intrepid tracked bug.

Alternatively, by absurd, if all hardy bugs were found to be fixed in
intrepid, we could nicely say "hurrah, hardy is the perfect
distribution: it has no open tracked bug at all".

Note that traking the bug only for intrepid means that if you look for
some bug concerning freezes on hardy you do not find this bug.
Also if one wanted to collect statistics and count hardy bugs, he would
not count this one.

> It isn't fixed in Hardy because it is very
> difficult to establish exactly what commit fixes the bug for the
> original reporter.
>
The question is if it is worth trying to establish it at all. If the
bug man...

Read more...

Revision history for this message
Mark Shuttleworth (sabdfl) wrote : Access to the Intrepid kernel on Hardy

sergio.callegari wrote:
> Testing whether a backported intrepid kernel breaks something
> on hardy would probably be a much faster and more useful effort.
>

It would be very valuable to give Hardy users the ability to test the
Intrepid kernel. Ben, is there a PPA where we build the Intrepid kernel
for Hardy, so we can invite folks to test for hardware regressions?

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

There was a PPA with Intrepid kernels built for Hardy (https://edge.launchpad.net/~kernel-ppa/+archive) but the ones built for Hardy have gone now. The original reporter (Elod) managed to test one and reported that it fixed his problems (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/197)

Revision history for this message
Tim Gardner (timg-tpi) wrote :

The problem with building a Hardy version of the Intrepid kernel is that it has a build dependency on makedumpfile which does not exist in Hardy main, only Intrepid main. I've uploaded a Hardy version of makedumpfile to the kernel PPA, so as soon as it has built, the kernel that is in DEPWAIT ought to kick off.

Note that if you require linux-restricted-modules for graphics or wireless, then loading the Intrepid kernel in Hardy won't work for you.

Revision history for this message
Ben Collins (ben-collins) wrote :

On Fri, 2008-08-01 at 19:09 +0100, Mark Shuttleworth wrote:
> sergio.callegari wrote:
> > Testing whether a backported intrepid kernel breaks something
> > on hardy would probably be a much faster and more useful effort.
> >
>
> It would be very valuable to give Hardy users the ability to test the
> Intrepid kernel. Ben, is there a PPA where we build the Intrepid
> kernel for Hardy, so we can invite folks to test for hardware
> regressions?

Installing the intrepid kernel on hardy is very simple and doesn't need
a recompile against hardy to work. In fact, an intrepid kernel image
should run with no problems on systems as far back as feisty.

The only issue is linux-restricted-modules, which isn't as easy to
install without a recompile, and quite frankly, the effort to make that
compile and work on pre-intrepid is way too much overhead for us to make
it worth it.

As far as a PPA, we've discussed this, and we just don't want the
barrier that low for people to run bleeding edge kernels on released
systems, since it may detract from people doing full testing of
intrepid.

So, if it's standard kernel testing, then they just need to add intrepid
to sources.list, install the new kernel, and revert sources.list back to
hardy (or just download the .deb and debi it).

Hope this helps!

Revision history for this message
Sergio Callegari (callegar) wrote : Re: [Bug 204996] Re: Access to the Intrepid kernel on Hardy

Ben Collins wrote:
> Installing the intrepid kernel on hardy is very simple and doesn't need
> a recompile against hardy to work. In fact, an intrepid kernel image
> should run with no problems on systems as far back as feisty.
>
> The only issue is linux-restricted-modules, which isn't as easy to
> install without a recompile, and quite frankly, the effort to make that
> compile and work on pre-intrepid is way too much overhead for us to make
> it worth it.
>
> As far as a PPA, we've discussed this, and we just don't want the
> barrier that low for people to run bleeding edge kernels on released
> systems, since it may detract from people doing full testing of
> intrepid.
>
> So, if it's standard kernel testing, then they just need to add intrepid
> to sources.list, install the new kernel, and revert sources.list back to
> hardy (or just download the .deb and debi it).
>
> Hope this helps!
>
>
The problem is not the kernel-image, that indeed installs. The problem,
for some reason, seems to be with the kernel headers that bring in a
huge amount of Intrepid dependencies and thus cannot be installed
without making apt very unhappy.

Sergio

--
----------------------------------------------------------------
Dr. Sergio Callegari Via Venezia 52
Researcher and Assistant Professor 47023 - Cesena
School of Engineering II Tel. +39.320.4365437
University of Bologna Fax. +39.051.0544887

Affiliated with:
DEIS - Dept. of Electronics, Computer Sciences and Systems,
        University of Bologna (www.deis.unibo.it)
ARCES - Advanced Research Center on Electronic Systems for
        Information and Communication Technologies
        University of Bologna (www.arces.unibo.it)
================================================================

Revision history for this message
Sergio Callegari (callegar) wrote :

Hi, I do not want to say it too loud (yet!) since I only had 12 hours testing, but 2.6.24-20 from the proposed updates seems to finally fix the freeze issue for me.

Many, many thanks to who finally resolved this longstanding issue!!!

Sergio

Revision history for this message
Jeremy Bar (j.b) wrote :

It seems like I am experiencing similar issues as I describe in bug 253852.

Jeremy

Revision history for this message
Elod VALKAI (elod) wrote :

I did not manage to test 2.6.24-20 as it's been pulled from the repo.

2.6.24-21 (2.6.24-21.41ubuntu1) produces the same freeze. I've managed to crash it in about 5 minutes. One terminal and one epiphany windows with a drupal-based site (with no flash).

2.6.24-21 also has the issue of getting from point A to B in 80 seconds. Point A is "Loading, please wait", B is "Reading files needed to boot".

Revision history for this message
wackyiniraqi (wackyiniraqi) wrote : Re: [Bug 204996] Re: Linux kernel 2.6.24-12 lockup

sounds alot like the bug 248591.....

Jeremy Bar wrote:
> It seems like I am experiencing similar issues as I describe in bug
> 253852.
>
> Jeremy
>
>

Revision history for this message
Sergio Callegari (callegar) wrote :

Sorry, I was too optimistic.
I am still experiencing the freeze on 2.6.24.20... only I had the first after 16 hours... one more after 8 more hours. This thing is subtle.

Sergio

Revision history for this message
Wolfgang Glas (wglas) wrote :

To all out knowledge, this bug may be most likely triggered by a slow, saturated network, which supposedly produces IP packet drops and/or triggers IP packets not arriving in sequence and hence need to reordered by the kernel.

Examples for such setups are UMTS data cards or 2MBit/s DSL lines with a running bittorrent client or a heavily AJAXed webapp like gmail.

I already poinetd out a patch, which has been commited on the 2.6.25 kernel branch which seems to fix a lockup problem with IP queues in the kernel.

Please, please, ubuntu kernel developers review this patch and start trying to reproduce this problem in the nea future,

  Regards,

    Wolfgang

Revision history for this message
wackyiniraqi (wackyiniraqi) wrote :

sounds alot like what I've got going on with my desktop [bug 248591]....though my laptop works fine....so dunno about the network issue. Seems my desktop will have to go to windozzze since the support for these lockups is nonexistent.

Revision history for this message
kpagan (paganelis) wrote :

Well I have a Toshiba Tecra A8-103 laptop with Intel Corporation 82573L Gigabit Ethernet Controller network card.
My wireless adapter is Intel Corporation PRO/Wireless 3945ABG Network Connection.
See attached file for hardware info.
I can reproduce the lockup when there is network traffic through my wired interface.
If I download a file from the internet or update-manager downloads a package the system locks up and even Alt+SysReq+R+E+I+S+U+B can help.
If I disable the wired cable and connect to the internet through my wireless card everything seems to be OK.
I have downloaded at least 1GB without lockups.
Correct me if I'm wrong but Ubuntu Feisty and Gutsy had restricted drivers for this card.
Now Hardy has its own drivers (non restricted). Can anyone confirm if this is the case?
I will try to install the restricted drivers and see if it solves the problem.
I hope I'm not wrong

Revision history for this message
kpagan (paganelis) wrote :

Excuse the double post but there is a typo above.
The correct phrase is:
If I download a file from the internet or update-manager downloads a package the system locks up and even Alt+SysReq+R+E+I+S+U+B can't help.

Revision history for this message
Jeremy Bar (j.b) wrote :
Revision history for this message
Wolfgang Glas (wglas) wrote :

My Laptop locks up with an UMTS dialup connection, another fellow of me ha his laptop locking up on wired ethernet connected to a DSL modem. Both lockups occur, when the upstream bandwith is saturated with many TCP/IP-connections regardless from the locally used network interface. We both do not have an Intel 82573L adapter. From the point of TCP/IP packet drops and reordering it is sometimes favorable to have the bottleneck on the local side, which might explain, that for some users a wireless connection from the laptop to the DSL router might cure the problem.

@kpagan: Were's your bandwith bottleneck with wireless/wired connection? Which bandwith is available in both cases.

And yes, I *do* think a kernel developer should review the following patch, which fixes a deadlock-problem with some corner-states of the TCP stack:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b000cd3707e7b25d76745f9c0e261c23d21fa578

  Best regards,

    Wolfgang

Revision history for this message
kpagan (paganelis) wrote :

@ Jeremy Bar
Yes I tried but with no success :(
Also I tried the instructions found here https://bugs.launchpad.net/ubuntu/+source/linux/+bug/226906 but with no success either :( :(
The lockup is related definitely with some error with the 82573L driver though

Revision history for this message
Yotam Medini (yotam-medini-gmail) wrote :

Sorry - no.
I am thinking of trying Interpid-alpha sometime soon,
-- yotam

On Sun, Aug 10, 2008 at 8:56 PM, Jeremy Bar <email address hidden> wrote:
> have you tried this fix?
>
> http://forums.opensuse.org/archives/sls-archives/suse-linux/hardware-
> support/384758-network-fix-various-notebooks-intel-82573l.html
>
> --
> Linux kernel 2.6.24-12 lockup
> https://bugs.launchpad.net/bugs/204996
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Jeremy Bar (j.b) wrote :

@ kpagan, after copying the newly build driver module e1000.ko in /lib/modules/`uname -r`/kernel/drivers/net/e1000/

Did you re-build the initrd with the following?

sudo update-initramfs -u -k `uname -r`

You have to check that when you start the system, the correct module gets loaded at boot time.
You can check with ethtool -i <device>
The version should be 7.6.15.5-NAPI

Jeremy

Revision history for this message
kpagan (paganelis) wrote :

@Wolfgang Glas:
what do you mean by "Were's your bandwith bottleneck with wireless/wired connection" ?
My wired connection is 100Mbps and my wireless is 802.11g so theoretically is 54Mbps.

@Jeremy Bar:
I tried the solution found here
http://forums.opensuse.org/archives/sls-archives/suse-linux/hardware-support/384758-network-fix-various-notebooks-intel-82573l.html
with the additional note for the make:
make CFLAGS_EXTRA=-DDISABLE_PCI_MSI CFLAGS_EXTRA=-DE1000_NO_NAPI install
I have to test it more thoroughly though. It seemed to improve my situation. I downloaded a 700MB file but it didn't downloaded completely because firefox crashed. I hope to test it this afternoon.
I haven't run "sudo update-initramfs -u -k `uname -r`" though.
When I test it I will report back here.
Thanks anyway :)

Revision history for this message
Wolfgang Glas (wglas) wrote :

ON the way to the internet servers you connect to, there's supposedly some kind of modem connection (DSL, UMTS etc...), which limits your bandwidth. Such a bottleneck causes IP packet drops, evokes TCP/IP retransmits and/or TCP packets arriving in the wrong order thus stressing the TCP/IP stack. If you beat at such a bottle neck with multiple TCP/IP connection, even more packet drops are evoked.

All our observations lead to the vast assumption, that all the deadlocks reported in conjunction with bittorrent clients etc. may be related to the TCP/IP stack being under stress. There's a patch which ha been applied during the 2.6.25 release cycle by SuSE, which fixes a deadlock within the TCP/IP implementation, so I'd really like to gegt feedback, whether our assumption may be true.

  Wolfgang

Revision history for this message
nexx (valentinojr) wrote :

Wolfgang,

Same issues here. I'm sure my problem is related to the intel ethernet device (module e1000) under stress. My system is a ibm/lenovo x60s.

My sympton is instant system freeze when ethernet card is under stress. The message from the NMI is:
    NMI received for unknown reason a0 on CPU 0

By doing a tcpdump i checked there's a problem in the LAN, as I saw 34000 packets of this type in just 12 sec of tcpdump:

21:51:01.531747 arp who-has 192.168.1.1 tell 192.168.1.1

The system works stable for weeks. When I connect the sytem to the network in this state between 10secs and 10min the system freezes..

Regards,
Valentino

Revision history for this message
kpagan (paganelis) wrote :

Ok, after the instructions of how to fix e1000 module and the instructions of Jeremy Bar I can connect to the internet with the wired connection without any lockups so far.
I haven't installed the -rt kernel neither have I done something else.
I think my problem is solved.
If something goes wrong I'll report back here else it will mean that my problem is solved permanently.
Thank you guys for the tip. I was just ready to install Gutsy again.
Thank you!!!

Revision history for this message
ThaFox (kehakettu) wrote :

This solved my freezes too. I have Acer 5024 laptop and haven't had any freezes after updating. I had only mouse moving, while I got these freezes, although I was able to reboot with alt-sysrq-REISUB.

Revision history for this message
Mariner09 (smcmackin) wrote :

I've been seeing lock-ups with similar symptoms, only the mouse moves and nothing else but the one-finger salute will cure the problem. I'm using the 2.6.24-19-generic kernel and I only use wireless with the iwl4965.

These lock-ups appear to happen when I'm not doing anything. I walked away from the the laptop at 5pm yesterday and at 10:13pm, it froze. Nothing in kern.log, /var/log/messages continues to add entries until my cold-boot this morning but they are only "--MARK--". At 9:31pm there was a message:

kernel: [ 520.906785] tun0: Disabled Privacy Extensions

Revision history for this message
Huygens (huygens-25) wrote :

Just wanted to had my 2 pennies on the subject.
Back in the time of Edgy, I decided to buy a WiFi card. My choice went for the RaLink RT61 chipset (it's an MSI card). I had to install manually the driver for it, so it could work on Edgy. Then came Feisty and (if I recall properly) right after upgrading I could still use my desktop in WiFi. Then came a kernel upgrade and suddenly I had frequent complete crash, just like it is report here. Impossible to see the ultimate log that lead to the crash. I reverted to the previous kernel, but strangely the crashes continued whereas before it never occurred. After fiddling around, I decided to blacklist the RT61 kernel module and to fetch back my old network wire :-(
That was the status until I quit using computer for a year. By that time, Ubuntu continued evolving and Hardy, then Gusty where out. I upgraded! And I thought that perhaps during that time the crash might have been solved. I removed from the blacklist the RT61 kernel module and my old network wire! Free again ;-) I just had to wait a couple of hours before a crash occurred! And I kept on trying but after 3 days, I gave up, blacklisted once more the module and put back my net wire. Since then (more than a week now), I had no crash whatsoever!
There really seems to be something wrong in this module that causes complete blackout of the computer (nothing respond at all).

PS: I could not find back the bug report on Feisty.

Revision history for this message
John Ward (automail) wrote :

I noticed "Fix Released" at the top of this page. What exactly is the fix, and by the looks of things its released for Intrepid and not Hardy.

Revision history for this message
Tom M. (wvm7fk202-deactivatedaccount) wrote :

I switched from Hardy Heron to Debian Lenny and it's running 2.6.25 and has been up weeks in X windows (with my PCI wireless card and Nvidia video and AMD Athlon64 dual core) chugging along nicely. No lockups. 2.6.25 definitely fixes it. Good luck all.

Revision history for this message
Sergio Callegari (callegar) wrote :

In response to John:

For what concerns the "fix released" indication... unfortunately it does not indicate that there is a fix released for hardy, as far as I know the part of the 2.6.24 kernel causing the issue is still to be identified. There has been some discussion on whether it was appropriate to place the "fix released" tag on this bug (see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/273).

In response to Tom M.

Unfortunately, this is one more indication that going with 2.6.24 for hardy has been a very unfortunate choice: it is not the kernel of fedora, it is not the kernel of opensuse, it is not the kernel of debian, and it has a userbase that is probably limited to hardy only. Furthermore it is not anymore upstream maintained. Patchsets for 2.6.26 and 2.6.25 come out regularly (2.6.25 is at .16 delivered on August 20th), while 2.6.24 stopped at .7 shipped on May 7th.

There has been in the past a strong request to make a version of the intrepid kernel installable on hardy. As a matter of fact it is, but to the best of my knowledge the headers (necessary for compilation against the kernel) are not.

Sergio

Revision history for this message
John Ward (automail) wrote :

Thanks for the response Sergio,

Is there anyway that kernel 2.6.25 can be put into the repositories for usage and testing and then possible release as an update during Hardy's last period? This problem is serious and having the basis for a flexible update system and a large group of people looking for this problem and reporting back the best information they can theres no reason that something can't be released for this crippling thing.

Revision history for this message
Sergio Callegari (callegar) wrote :

This has been asked many times (including myself). Unfortunately, the answer so far has not been positive:

1) Switching hardy to the 2.6.25 kernel has been excluded as a "jump in the dark".
2) Providing two alternative kernel versions for hardy (namely both 2.6.24 and 2.6.25) has been indicated as not sustainable with the resources of the ubuntu kernel team.

The closest we got is:

a) an interview (to Mark Shuttleworth, if I remember correctly) where it is said that due to the very long support time of hardy (5 years on server) hardy might eventually switch to a more modern kernel when it becomes impossible to support 2.6.24 (cannot find the link, sorry).

b) an email on this very list, again by Mark Shuttleworth suggesting that it would be very valuable to give Hardy users the ability to test the Intrepid kernel. Unfortunately, in applying this proposal there is there is an apparent need to compromise since kernel developers do not want to decrease the motivation to test the intrepid codebase as a whole. The situation so far is that the intrepid kernel (2.6.26) can be installed on hardy, but not its kernel headers (and not either the restricted modules from what I heard).

Personally, what I have done so far on all the machines I am responsible for is using ubuntu without the hardy kernel, having compiled a 2.6.25 and then a 2.6.26 from kernel.org with make-kpkg that gives you nice deb packages. It is a bit of a pain to upgrade whenever a new patchset comes out for 2.6.26... but... still better than the lockups. In any case, 8.10 is not that far away now, so lets just hope this times it takes the same kernel version as fedora or opensuse.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

I'm expecting that we will shortly have a PPA for hardy which includes:

 - the proposed Intrepid kernel
 - a daily build of kernel.org's "tip"
 - the virgin kernel.org kernel that corresponds to hardy's kernel

Between those, we should have ample opportunity to help provide testing
for upstream as well as triage issues specific to Ubuntu.

Mark

Revision history for this message
Jeremy Bar (j.b) wrote :

Mark, that would be great, because the Lenovo Thinkpad T60 and X60s I own are both affected by this issue. The only solution I am aware of now is a manual patch of the e1000 driver.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Ben Collins wrote:
> Upstream doesn't care about testing 2.6.24 any more.
But it is useful for us to assess if an issue was introduced in patches
we added to that stable release, or if it was there already.
> They want us to help test tip.
Sure, which is why we should make tip available for both stable and
development releases (currently Hardy and Intrepid).
> Besides, there's no good base to say "corresponds to hardy's kernel"
> because we stopped syncing at like 2.6.24.2, but we have lots of
> cherry picks for CVE's and SRU's from 2.6.24.y beyond .2. So hardy is
> currently > 2.6.24.2 but < 2.6.24.y head.
Then choose either .2 or .y, I would go with .2 personally, and I would
also try not to stop syncing, though I understand there are ABI issues.

> So it wouldn't even be beneficial to us to provide a "stock" kernel
> for hardy users. It wouldn't tell us the difference between .y fixing
> it, or stock working because we have a bad patch.
But .2 would tell us that.

> Ubuntu-next we've already started with. I'm quite reluctant to provide
> it in a PPA. Upstream constantly complains about the quality of bug
> reports from our users, and I fear that this would increase it because
> of non-technical users trying these kernels and not being able to
> properly help debug them.
I think we should DROP ubuntu-next. It's more work than any other
option, it's bugs are of no interest to upstream OR US.

> IMO, if we really want this PPA stuff, we need more man-power on the
> QA and engineering end of it. Just making it available isn't useful at
> all and would probably cause the reverse with upstream than what you
> want.
Please nonetheless put these into your plan, with or without ubuntu-next.

> On a similar note, I've considering putting out the idea of adding the
> LP bugzilla plug-in to upstream kernel to make it easier for us to
> forward good bug reports upstream.

That would rock indeed :-)

Mark

Revision history for this message
wackyiniraqi (wackyiniraqi) wrote :

Check out bug 248591....just was informed that "

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the
upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would
appreciate it if you could please test this newer 2.6.27 Ubuntu kernel.

"

Mark Shuttleworth wrote:
> Ben Collins wrote:
>
>> Upstream doesn't care about testing 2.6.24 any more.
>>
> But it is useful for us to assess if an issue was introduced in patches
> we added to that stable release, or if it was there already.
>
>> They want us to help test tip.
>>
> Sure, which is why we should make tip available for both stable and
> development releases (currently Hardy and Intrepid).
>
>> Besides, there's no good base to say "corresponds to hardy's kernel"
>> because we stopped syncing at like 2.6.24.2, but we have lots of
>> cherry picks for CVE's and SRU's from 2.6.24.y beyond .2. So hardy is
>> currently > 2.6.24.2 but < 2.6.24.y head.
>>
> Then choose either .2 or .y, I would go with .2 personally, and I would
> also try not to stop syncing, though I understand there are ABI issues.
>
>
>> So it wouldn't even be beneficial to us to provide a "stock" kernel
>> for hardy users. It wouldn't tell us the difference between .y fixing
>> it, or stock working because we have a bad patch.
>>
> But .2 would tell us that.
>
>
>> Ubuntu-next we've already started with. I'm quite reluctant to provide
>> it in a PPA. Upstream constantly complains about the quality of bug
>> reports from our users, and I fear that this would increase it because
>> of non-technical users trying these kernels and not being able to
>> properly help debug them.
>>
> I think we should DROP ubuntu-next. It's more work than any other
> option, it's bugs are of no interest to upstream OR US.
>
>
>> IMO, if we really want this PPA stuff, we need more man-power on the
>> QA and engineering end of it. Just making it available isn't useful at
>> all and would probably cause the reverse with upstream than what you
>> want.
>>
> Please nonetheless put these into your plan, with or without ubuntu-next.
>
>
>> On a similar note, I've considering putting out the idea of adding the
>> LP bugzilla plug-in to upstream kernel to make it easier for us to
>> forward good bug reports upstream.
>>
>
> That would rock indeed :-)
>
> Mark
>
>

Revision history for this message
Martin Božič (martin-bozic) wrote :

Well, to add some more confusion to this bug...

The last kernel panic I experienced was on the first boot up after upgrading the kernel to 2.6.24-19. After that, no kernel panics whatsoever. Also processor hiccups dissapeared somewhere in the beginning of August. I've tested the 2.6.24-17 and *-18 kernels each one for a day, daily common use, no problems with any of the current kernels (although Firefox was crashing every second time I was playing Flash videos in *-17 kernel). I have Dell Latitude D400 laptop, no non-free drivers.

One more thing, kernel panics gradually disappeared over time, at least that's how it seemed to me in my case.

So, could it be that this bug is not directly the kernels fault, but some common and crucial package that can be found on all Ubuntu variants?

Revision history for this message
John Ward (automail) wrote :

I have downloaded and installed "Ubuntu 8.10 Intrepid Ibex Alpha 5" and apart from being a little incomplete here and there (it is an Alpha build after all) I can say confidently that for the last 5 days Ubuntu has been running smoothly without any kernel panics. I have left the machine on its own and no kernel panics, I have stressed it with Azureus Vuze and multiple torrents, Firefox with flash content and multiple open tabs, The Gimp and flame rendering, Rhythmbox, Brasero disc burning, Sound Juicer .ogg ripping and System Monitor, all on at the same time and there has not been a hiccup, a single crash or a kernel panic. Well done in nailing this bug, or accidentally realising its not in the .27 kernel - whatever you did the problems are gone.

I urge others still experiencing this problem to try Ubuntu 8.10 Intrepid and see if the panics disappear. Apart from an issue with installing proprietary nVidia drivers everything has been running very well with 8.10 and I actually like the new "Brown Human" theme.

Anyway, I'm glad to say these things.

John.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi John,

Thanks for testing and the feedback regarding Intrepid Ibex Alpha5. Would anyone else be willing to test Alpha5 as well. It does, as John pointed out, contain a newer 2.6.27 kernel. For more information regarding Alpha5 please refer to http://www.ubuntu.com/testing/intrepid/alpha5 . Please let us know your results. Thanks.

Revision history for this message
nst (nst16) wrote :

Hi,
I'm experiencing a hard freeze both with a fresh Hardy install and with an updated Intrepid (kernel 2.6.27-3). My laptop is a Targa Traveller 826 with AMD Turion64 1.8GHz and ATI Radeon Mobility X700.
In my case the freezes seem to be correlated with heavy network traffic, but not with a specific application (Firefox,synaptic,apt-get). Most times it only takes less than one minute after the start of the data transmission until it freezes. It happens both with wired and with wireless network connections.
When booting with acpi=off I experienced no freezes so far.
Let me know if you need you need additional information.
Nils

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Does anyone see a clear kernel panic during these hangs? For example, if
it can be reproduced within a minutes, it would be interesting to boot
into administrative mode and run a big FTP download or other network
stress test, to see if a kernel panic is displayed. If anyone sees it
then grab a camera and attach it here.

Mark

Revision history for this message
Dev (scotty-amnet) wrote :

Got the same issue here on hardy, 2.6.24-21-generic, nVidia Corporation GeForce 7100 GS (rev a1) running driver 173.14.12, RaLink RT2561/RT61 rev B 802.11g

locks up hard, no mouse, no ping... nothing at all.

If I can be of any use please let me know.

Revision history for this message
nst (nst16) wrote :

Unfortunately I don't see a clear kernel panic when the freeze occurs. I am able to reproduce it in a console e.g. by initiating a ftp transfer. It takes only a few seconds until it freezes. The cursor stops blinking, there is no reaction to inputs (CapsLock, Magic SysRq ..) and there are no entries in the logs.
Regards
Nils

Revision history for this message
Wolfgang Glas (wglas) wrote :

Honestly, I think this bug should be chased by a core kernel developer, because

  a) This issue is very long standing
  b) The impact on the affected users is high
  c) It is well known, that there are no logs or kernel panics etc. are produced

Best regards,

   Wolfgang

Revision history for this message
Elod VALKAI (elod) wrote :

I've upgraded to Intrepid, and it's got other nasty problems (the intel driver in xorg does not like intel's 830 chipset).

The kernel (2.6.27) seems stable, but it's very strange to see a kernel that has not been released (rc6 at the moment) in a linux distribution to be released in a month.

Revision history for this message
Elod VALKAI (elod) wrote :

After one week with Intrepid (and having resolved the xorg intel driver quirks) I can say I'm pretty happy with it. 2.6.27 is stable. I hope it stays that way :).

I won't test 2.6.24 further, as it's simply impossible. I'm beginning to think that an LTS release should have the kernel of the previous release. At least it's been tested _widely_ for 6 months.

Revision history for this message
Stefan_Ares (twolve2146) wrote :

Hello everyone who seems to be in the same sinking boat, I guess I'm here to join you.

I've been trying linux for about a month now, and with the same similar problem as many of you. My system will lock-up and I can't use the mouse or any keyboard buttons. I have ran openSuse and ubuntu hardy 32 bit, also just started running ubuntu hardy 64 bit.

I noticed with a GPU temp sensor that whenever my temp of my GPU gets up to 61C and hovers there without dropping, the computer will reliably lock up, and the best thing to do is to let it cool down. It seems to get really hot for no reason, since I'm running cooler in Vista right now (and in my opinion Vista should be running much hotter because of how taxing it is on hardware).

I would also like to post for the dev team that I have tried out Alpha5 and Alpha 6, both causing the same lock-up problem. It seems to take longer to heat up in these versions because the system seems to use less power in Intrepid, and I would love to switch but I'm trying to find an OS that won't freeze after getting too hot. Even though I realize that it may just be doing this to protect my hardware.

I have no logs of my problems, but I feel confident that my case is a GPU temp problem, after having tried many things. I doubt it is IO in my case because its usually when I'm browsing in firefox (which causes my computer to run hotter)

Revision history for this message
bwana (marcusmarcus) wrote :

I've experienced frequent lockups with 8.04 32/64 (and -rt), 8.10 64 (all the way to Linux dlm1 2.6.27-1-generic #1 SMP).

I just had to reset my computer and I captured logs for a "full cycle" (from boot to crash - I got to the logon screen, then kablam).

I've attached a zip with the output of:
* cat /proc/version_signature > version.log
* dmesg > dmesg.log
* sudo lspci -vvnn > lspci-vvnn.log

I've also attached a bunch of logs from the full "cycle mentioned" earlier.

I've been hoping for this progress since migrating off 6.06 - but I've given up.
I'm off to distrowatch.com et al. to find me a replacement for Ubuntu.
I've wasted too much time believing/hoping that someone would be able to find a fix.

Revision history for this message
Chainz (chainzee) wrote :

I have just replaced my Graphics card in my PC, from ATI Rage 128 to ATI HD 3450...
And guess what? NO MORE LOCKUPS!!!

Revision history for this message
Sam!r Jadhav (jadhav333) wrote :

Sometimes my desktop just freezes. and there are some vertical blue dotted streaks across the screen. The mouse works but nothing on the desktop can be clicked.

Assuming some compatibility error due to any recently installed software, I did a clean install of the OS. But the proble still persists.

Though it occurs randomly, it usually occurs while I am using firefox browser 3.0.3.

This is the first time since I am using ubuntu (since version 6.06) that I am encountering an issue like this.

I am unable to give a screenshot becoz even the print scrren functionality doesnot work during the freeze.

Can anybody suggest a solution?

My Spec
Quadcore intel cpu Q6600@2.4Ghz, 4 Gb ram, S975XBX2 motherboard, PCI Express GeForce 8600 GT (generic drivers)
Ubuntu 8.10 Intrepid Ibex 64bit
Kernel Linux 2.6.27-7-generic
Gnome 2.24.1
Partition Info:
\-------------------Root------------10GB
\Home-----------Home----------10GB
\Media\Sda3---Data folder--400Gb

Revision history for this message
Chainz (chainzee) wrote :

Your case might be connected with flash issues.
Please try to use flash block and see if it happens again.

2008/11/20 Sam!r Jadhav <email address hidden>

> Sometimes my desktop just freezes. and there are some vertical blue
> dotted streaks across the screen. The mouse works but nothing on the
> desktop can be clicked.
>
> Assuming some compatibility error due to any recently installed
> software, I did a clean install of the OS. But the proble still
> persists.
>
> Though it occurs randomly, it usually occurs while I am using firefox
> browser 3.0.3.
>
> This is the first time since I am using ubuntu (since version 6.06) that
> I am encountering an issue like this.
>
> I am unable to give a screenshot becoz even the print scrren
> functionality doesnot work during the freeze.
>
> Can anybody suggest a solution?
>
> My Spec
> Quadcore intel cpu Q6600@2.4Ghz, 4 Gb ram, S975XBX2 motherboard, PCI
> Express GeForce 8600 GT (generic drivers)
> Ubuntu 8.10 Intrepid Ibex 64bit
> Kernel Linux 2.6.27-7-generic
> Gnome 2.24.1
> Partition Info:
> \-------------------Root------------10GB
> \Home-----------Home----------10GB
> \Media\Sda3---Data folder--400Gb
>
> --
> Linux kernel 2.6.24-12 lockup
> https://bugs.launchpad.net/bugs/204996
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
>
> Bug description:
> Binary package hint: linux-image-2.6.24-12-generic
>
> I was upgrading from gutsy (with Linux 2.6.22-14) to the latest alpha last
> sunday (16.03.2008), and I've got some problems with the kernel.
>
> The 2.6.24-12-generic (I think, may be -386) causes my machine to lock up
> (hard) after about 5 minutes. Generally I was under X, there is no specific
> program I was using at the time it locked up.
>
> The hardware is completely stable & has been for the last 3 years with
> ubuntu, and still is with hardy & the gutsy kernel.
>
> Another thing I noticed is that something in the initrd keeps the machine
> from booting for at least 2-3 minutes. It's definetely before running
> scripts in the /etc/rcS.d folder, I have not traced it in the initrd. After
> booting all is well until it locks up.
>
> The hardware is a Dell C400, Intel chipset, Intel graphics (i830), 384MB of
> RAM and an atheros wireless card.
>
> lsb_release:
> Description: Ubuntu hardy (development branch)
> Release: 8.04
>
> It's completely reproductible, the freeze takes out the whole kernel (no
> reply to ping from network). Any suggestions on how to trace it? I have a
> serial port, no parallel.
>

Revision history for this message
Sergio Callegari (callegar) wrote :

I have upgraded to intrepid the PC on which I was experiencing the hard lockups. And the lockups have disappeared.
I have also upgraded to hardy a laptop on which I was running gutsy. And the lockups have appeared although by no means as frequently as I was experiencing them on the older desktop. I am experiencing about a couple of hard lockups a week, with the laptop on for most of the day.

So the problem seems to be really due to 2.6.24.

Revision history for this message
brainiac8008 (frankfurter) wrote :

Hello all,

I wrote a while back in the comments about how my computer would lock up with Hardy installed. I had found that my wireless USB adapter, based on the zd1211 platform, would cause the system to freeze as soon as Ubuntu tried to establish a connection to the Internet. I have a feeling that I also had the more common problem of the computer locking up at seemingly random times.

Well, I installed Intrepid last week and it has been very stable. I have not gotten a single lockup! I'm glad that I am finally able to use Ubuntu again.

Thanks,
Noah

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Adam Buchbinder (adam-buchbinder) wrote :

I had this problem with Hardy; I upgraded to Intrepid recently, but it's still present--mouse won't move, machine won't respond to pings, nothing in the logs. It seems to be associated with network activity--loading a lot of tabs in Firefox or Epiphany--but I can't reliably reproduce it. It seems not to happen when I don't have anything at all running on the machine. It's a Thinkpad T40p; the four logs attached in, e.g., comment 324, will be attached momentarily.

Changed in linux:
status: Fix Released → Confirmed
status: Fix Released → Confirmed
Revision history for this message
Adam Buchbinder (adam-buchbinder) wrote :

Attached find the tarballed results of:

$ uname -a > uname-a.log
$ cat /proc/version_signature > version.log
$ dmesg > dmesg.log
$ sudo lspci -vvnn > lspci-vvnn.log

Revision history for this message
Adam Buchbinder (adam-buchbinder) wrote :

As I don't have a good way of triggering the bug, I'd like to request that anyone who can do so attempt to find the regression via git-bisect. It's somewhat time-consuming, but it will definitely let us nail down the bug and submit it upstream (as it's appeared in vanilla kernels as well as Ubuntu-specific ones). It would finally make it possible for us to isolate and fix this thing after all these months.

There are good instructions for bisecting the kernel over in bug 273266; I'm willing to help out however I can. If anyone has any specific suggestions for making the bug easily reproducible on my own laptop, I'll try to help out with it as well, though I don't remember the specific revision at which this last didn't happen.

Revision history for this message
Adam Buchbinder (adam-buchbinder) wrote :

Status update: I've gotten started on finding the regression via git-bisect. I was able to reproduce the bug on 2.6.27 vanilla, and unable to do so on 2.6.22 vanilla. I'm currently down to just under seven thousand candidate commits left. The whole thing is rather complicated by the fact that configuration options change between kernel versions, which has made me a bit leery of the possibility of actually isolating the code change that introduces the bug.

Also, older kernel versions seem to cause X to die every so often, and even apart from that, testing is slow, because finding out if a kernel is bad or not is like the Halting Problem--I can only get a "yes" answer, never a definitive "no".

Revision history for this message
bwana (marcusmarcus) wrote :

I'm not experiencing the random crashes anymore (on any Ubuntu versions).

On further examination, I managed to track it down to a faulty GPU fan causing the system to overheat (duh!).

I was also seeing this problem on a laptop. Turns out that the laptop would remain on 24 hours a day. I removed the battery (as it was permanently plugged in anyway) which lowered the temperature, which caused the crashes to stop.

In other words, the data I provided earlier in order to assist in the troubleshooting will only show how a system looks while overheating - so please disregard it when trying to get to the bottom of the real bug.

Hope I didn't encourage anyone to waste time barking up the wrong tree..

Cheers,
/m

Revision history for this message
Mark Shuttleworth (sabdfl) wrote : Re: [Bug 204996] Re: Linux kernel 2.6.24-12 lockup

Marcus, thanks for the update. Adam, thanks for trying to chase this
down, I hope we can at least identify the revision which caused the
issues you are seeing.

In general, I think this bug has become a melting pot of a number of
different lockup issues, but eliminating them one by one is worthwhile
nonetheless.

Mark

Revision history for this message
seh62 (seh62) wrote :

I've been running 8.04 in Virtual Box in a Windows XP host on a Toshiba Laptop with an Intel Celeron M and Atheros wifi for 8 or 9 months now and it worked great, (except Vbox has no support for 3D acceleration).
My hard drive broke and I installed a new one and decided to to dual boot which worked fine. I use a dial up at home, this caused me some confusion and

I finally got it working with sl-modem-daemon (after trying SLMODEMD), but shortly therafter I experienced my first crash, just a loud noise from the hard drive then all power off. I went to a wifi spot to download all the updates before trouble shooting, this was a major hassle, finally I got a connection but was forced to leave before finishing the updates. I returned began updating again had to stop for awhile and when I tryed to reconnect I experienced the sudden crash again when I rebooted the wireless and ethernet drivers where gone (no wireless in the network manager and no drivers listed in hardware drivers) then another crash.

Now ubuntu will only run for about a minute before a hard crash (not much time to do anything), I decided to reinstall ubuntu in the same partition but when I boot up with the CD by the time I make it to the install dialoque there's a hard crash.

EVERY TIME before a crash when plugged in to AC power, the charge light turns to amber (indicating battery charging), the screen freezes, hard drive makes a loud noise, everything shuts completely off, and then the charge light turns back to green (indicating full charge on AC power).

This is depressing, never had a problem in Vbox maybe those guys at Sun have something going. Am I going to have to wait for the 8.10 disc to arrive? I really like Ubuntu, Windows sucks, help!

Revision history for this message
Adam Buchbinder (adam-buchbinder) wrote :

seh62, that sounds separate from this; if it's something you think you can get a developer to reproduce, file a bug; if it's not, file a question.

As for me, I've had to do some backtracking after a version which I'd previously considered good turned out to be bad; the number of revisions left to test is still rather large. I'm still working on it, though building kernels is rather slow. The best way to reproduce it seems to be downloading something with Vuze. Not just uploading, but downloading, seems to cause a lockup almost certainly within a few hours. It's kind of hard to quantify, though. More information will follow as I narrow it down. Each known-bad commit *does* help me pare back the search space, at least somewhat. I'm getting closer. I must be.

Revision history for this message
Dieter Burghardt (dieter-burghardt) wrote :

At first my machine (Intel Core2 Quad CPU) was perfectly stable since november '08. The machine was up 24/7 and also running 4 instances folding@home.
But when I started to use folding@home with the -smp switch I got random lockups.
The machine is not overclocked and in both cases the machine is under high CPU load, so overheating or other hardware related issues shouldn't be the cause of the lockups. I guess the lockups are related to MPI (which is turned on in folding@home with -smp switch.)

Revision history for this message
Dieter Burghardt (dieter-burghardt) wrote :

Oops, forgot to mention ... it's intrepid, and the kernel is up to date

2.6.27-11-generic #1 SMP Thu Jan 29 19:28:32 UTC 2009 x86_64 GNU/Linux

Revision history for this message
hadisen (microtherm) wrote :

I experienced my first two random(!) kernel panics (blinking caps and scroll lock) today after having used my computer for 2 or 3 weeks without any problems. It is a Sony Vaio Subnotebook VGN-TX3XP running Ubuntu 8.04.2 Kernel 2.6.24-23-generic. I suspect it to be a network related bug, scince it did not occur a third time when I had deactivated my wireless network (driver iwl3945 - also see thread 944123 "The Broadcom STA wl driver is buggy"). I would like to mention that my neigbour has exactly the same router as me, which caused some problems with the roaming-mode under the Gnome network-manager during the last days, so the bug might have something to do with WLAN overlapping, too. I replaced it with Wicd, activated the WLAN again and have had no further kernel-panics so far. I'll post if they come back.

Revision history for this message
Adam Buchbinder (adam-buchbinder) wrote :

My attempts to compile at least a dozen kernels so far, leaving each running for either a week or until it froze, have led to varying degrees of confusion; kernels that I thought were good would freeze up or work for a week straight depending on, it seemed, the phase of the moon. Bisecting the kernel has taken a great deal of time, but has not, in the end, been particularly helpful. I've been reluctant to upgrade to Jaunty yet, since upgrading in the first place is what started all of this, and it could always get worse, but since the problem *seems* to be kernel-based, I can always just boot into the Intrepid kernel afterwards.

Future directions that I'll be working on:

I don't recall whether or not I actually removed the 'airo' module when I was using the wired ethernet for my connectivity. Seeing if that helps may be helpful in tracking down the actual bug.

I hadn't heard of netconsole ( https://wiki.ubuntu.com/KernelTeam/Netconsole ); since the bug locks the system hard enough that no logging information is available on the next boot, this may be a helpful avenue; if I can get a stack trace, I can even bring it upstream. (I would have done so already, but despite the time I've sunk into this disaster, I essentially know little more than I did when I started.) On the other hand, the wireless on my laptop is semi-broken enough (connection quality constantly wavers, and occasionally drops entirely, even though other wireless devices work fine) that this may not help. If that's the case, I'll connect it by wire and try to trigger the crash that way.

Revision history for this message
Jim Lieb (lieb) wrote :

This path has been marked as invalid because the original bug applies to 2.6.24 and Hardy. If Intrepid with its 2.6.27 kernel locks up for you, please file a new bug against that release and that kernel. This close message is the 343rd comment action and your Intrepid issue will get lost in this avalanche of bug comments. This also applies to Jaunty and Karmic issues. We ask for separate bugs based on release, system/cpu type, and usage because a "hang" or "lockup" is a very generic description of what most often ends up being a very specific case for a very particular configuration. A separate bug gets noticed and the comment path to its resolution is comprehensible. Please understand that this is has been marked as invalid for Intrepid because the original fault was logged against Hardy and 2.6.24. Your bug issue is still valid, but not here. Thank you.

Changed in linux (Ubuntu Intrepid):
status: Confirmed → Invalid
Revision history for this message
Jim Lieb (lieb) wrote :

This bug is closed because it originally applied to a 2.6.24-12 kernel. If you are experiencing an issue with the current Hardy kernel, please file a new bug with the details as outlined in the wiki page(s). In this way we can sort out and address issues that are still active and relevant from what was already fixed and no longer applicable. Thank you for your cooperation.

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.