Bug #528981 “Repetitive massive filesystem corruption” : Bugs : linux package : Ubuntu

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-02-27:

#1

Corruption detected by e2fsck on ext4 Edit (22.0 KiB, text/plain)
AlsaDevices.txt Edit (589 bytes, text/plain; charset="utf-8")
AplayDevices.txt Edit (323 bytes, text/plain; charset="utf-8")
ArecordDevices.txt Edit (622 bytes, text/plain; charset="utf-8")
BootDmesg.txt Edit (39.1 KiB, text/plain; charset="utf-8")
Card0.Amixer.values.txt Edit (5.3 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec97.0.ac97.0.0.txt Edit (1.2 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec97.0.ac97.0.0.regs.txt Edit (767 bytes, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (426 bytes, text/plain; charset="utf-8")
IwConfig.txt Edit (277 bytes, text/plain; charset="utf-8")
Lspci.txt Edit (10.3 KiB, text/plain; charset="utf-8")
PciMultimedia.txt Edit (749 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (1.3 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (1.5 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (2.5 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (90.1 KiB, text/plain; charset="utf-8")
UdevLog.txt Edit (209.1 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (214.9 KiB, text/plain; charset="utf-8")

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-02-27:

#2

e2fsck.2.log Edit (958 bytes, text/plain)

Once again, the filesystem switched to read-only while the system was in use. The resulting fsck log is attached. This is using the standard 2.6.32-14-generic kernel immediately after a fresh install. Total uptime before the problem was discovered was under five minutes.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-03-01:

#3

Using mainline kernel 2.6.32.8 results in the same sort of corruption, but it appears to be somewhat less pronounced. Unfortunately, the corruption also blew up the kernel itself, so the system could no longer boot. Since further blind experimentation seems pointless, I've reinstalled Kubuntu 9.04 until I receive some further suggestions regarding steps to try next.

Revision history for this message

Stuart (stuartneilson) wrote on 2010-03-01:

#4

I have very similar symptoms following a fresh install of Ubuntu 9.10 on a Dell Inspiron 1501, which I have installed once in a single ext3 partition and subsequently in two (/ and /home) ext3 partitions, with the same symptoms both times. I have records of an orphan node cleanup in logs (similar to "[ 6.974972] EXT3-fs: INFO: recovery required on readonly filesystem." described by another user at http://ubuntuforums.org/archive/index.php/t-1180159.html). There is no record of the events leading to the filesystem becoming readonly (obviously).

Possibly related is the bug report "ext4 journal error, remounted read-only after resume", https://bugs.launchpad.net/ubuntu/+source/linux/+bug/438379.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-03-01:

#5

Bug 438379 does look quite similar to my problem, but I should note that I've never suspended nor resumed before experiencing fs corruption, so suspend/resume issues may or may not be relevant.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-03-02:

#6

As an additional troubleshooting step, I decided to run memtest86+ to verify that my RAM is OK, and the result was that everything checks out just fine.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-03-08:

#7

I have now tested mainline kernel 2.6.33-999.201003071003 (i386) and have experienced absolutely NO filesystem corruption under it.

Rebooting from 2.6.33 to 2.6.32-16.24 resulted in the following error:

EXT4-fs error (device sda1): ext4_lookup; deleted inode referenced: 3409181
Aborting journal on device sda1-8.
EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
EXT4-fs (sda1): Remounting filesystem read-only
EXT4-fs (sda1): Remounting filesystem read-only

This last error has occurred only once, and followed a clean shutdown. Using the Alternate CD recovery mode to run e2fsck allowed journal replay, and no further errors have been detected under 2.6.32-16.24.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-03-09:

#8

More errors related to 2.6.32-16.24, but they happened after an unexplained system lockup. Unfortunately, the error also destroyed Firefox, which was the open application when the lockup occurred. Fortunately, enough of the system was still functioning that error messages were logged at the end of dmesg:

[ 262.957715] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 572: 155 blocks in bitmap, 280 in gd
[ 262.957734] Aborting journal on device sda1-8.
[ 262.960225] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[ 262.960240] EXT4-fs (sda1): Remounting filesystem read-only
[ 262.985668] EXT4-fs (sda1): Remounting filesystem read-only
[ 263.028132] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 576: 169 blocks in bitmap, 320 in gd
[ 263.065760] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 633: 242 blocks in bitmap, 479 in gd
[ 263.066814] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 638: 100 blocks in bitmap, 318 in gd
[ 263.077529] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 644: 276 blocks in bitmap, 503 in gd
[ 263.078601] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 649: 317 blocks in bitmap, 601 in gd
[ 263.079040] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 651: 313 blocks in bitmap, 619 in gd
[ 263.079679] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 654: 390 blocks in bitmap, 691 in gd
[ 263.079906] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 655: 347 blocks in bitmap, 624 in gd
[ 263.089608] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 661: 427 blocks in bitmap, 839 in gd
[ 263.090097] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 665: 471 blocks in bitmap, 824 in gd
[ 263.090327] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 666: 447 blocks in bitmap, 877 in gd
[ 263.090681] EXT4-fs (sda1): delayed block allocation failed for inode 4588580 at logical offset 0 with max blocks 3534 with error -30

More errors related to 2.6.32-16.24, but they happened after an unexplained system lockup.  Unfortunately, the error also destroyed Firefox, which was the open application when the lockup occurred.  Fortunately, enough of the system was still functioning that error messages were logged at the end of dmesg:

[  262.957715] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 572: 155 blocks in bitmap, 280 in gd
[  262.957734] Aborting journal on device sda1-8.
[  262.960225] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[  262.960240] EXT4-fs (sda1): Remounting filesystem read-only
[  262.985668] EXT4-fs (sda1): Remounting filesystem read-only
[  263.028132] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 576: 169 blocks in bitmap, 320 in gd
[  263.065760] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 633: 242 blocks in bitmap, 479 in gd
[  263.066814] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 638: 100 blocks in bitmap, 318 in gd
[  263.077529] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 644: 276 blocks in bitmap, 503 in gd
[  263.078601] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 649: 317 blocks in bitmap, 601 in gd
[  263.079040] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 651: 313 blocks in bitmap, 619 in gd
[  263.079679] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 654: 390 blocks in bitmap, 691 in gd
[  263.079906] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 655: 347 blocks in bitmap, 624 in gd
[  263.089608] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 661: 427 blocks in bitmap, 839 in gd
[  263.090097] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 665: 471 blocks in bitmap, 824 in gd
[  263.090327] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 666: 447 blocks in bitmap, 877 in gd
[  263.090681] EXT4-fs (sda1): delayed block allocation failed for inode 4588580 at logical offset 0 with max blocks 3534 with error -30

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-03-09:

#9

And again on EXT3, following a fresh install, with all packages updated to latest available. This is running the 2.6.32-16 kernel.

[ 329.714992] EXT3-fs error (device sda1): htree_dirblock_to_tree: bad entry in directory #7192587: rec_len % 4 != 0 - offset=0, inode=539167267, rec_len=28483, name_len=112
[ 329.715006] Aborting journal on device sda1.
[ 329.716290] ext3_abort called.
[ 329.716298] EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
[ 329.716303] Remounting filesystem read-only
[ 329.826604] Remounting filesystem read-only

Scott Testerman (scott-testerman) on 2010-03-17

tags:

added: karmic

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-03-27:

#10

Still happens under 2.6.32-17-generic #26-Ubuntu. It's now much less pronounced, but the system more frequently hard locks when the filesystem goes read-only, and so requires a power cycle to restart. This situation also leads to the requirement to run e2fsck from a CD.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-03-29:

#11

Have now been running kernel 2.6.33-020633-generic for over 24 hours with no read-only events. Even better, after the system locks up due to various i855 xorg problems, on reboot the machine is experiencing no data loss at all. Seems clear the data loss issue has been resolved upstream.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-04-01:

#12

Kernel 2.6.32-18-generic no longer has spontaneous corruption at all. When the system hard locks due to other problems (an xserver lockup, for instance), the system still requires running e2fsck from a CD, and can still experience significant corruption. This is distinctly different from 2.6.33, which has only very minor, understandable, errors under the same conditions.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-04-18:

#13

Debian Squeeze kernel 2.6.32-3-686 and kernel 2.6.32-trunk-686 do not exhibit any corruption at all. This leads me to believe the problem is a configuration problem, and not a problem inherent to the 2.6.32 kernel itself.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-04-22:

#14

Still running Debian Squeeze with kernel 2.6.32-3-686, and still no corruption. The Debian configuration appears to be using CONFIG_IDE instead of CONFIG_ATA, since the hard drive is reported as hda rather than sda.

I've found an upstream bug report / flame war that may or may not provide any useful information, indicating that some users may have experienced this problem as far back as kernel 2.6.28 (although that Ubuntu kernel still works with no corruption on my system. Read at risk of your own sanity: https://bugzilla.kernel.org/show_bug.cgi?id=13365

Scott Testerman (scott-testerman) on 2010-04-25

tags:

removed: needs-upstream-testing

Revision history for this message

Jeremy Foshee (jeremyfoshee) wrote on 2010-04-26:

#15

Hi Scott,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags:	added: needs-upstream-testing
tags:	added: kj-triage
Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-04-26:

#16

Thanks Jeremy!

By "latest upstream," are you referring to 2.6.33.2-lucid (the series I've already tested with success) or 2.6.34-rc5-lucid (which I haven't tested at all)?

Please allow a few days for me to perform the testing and I'll report back here.

Revision history for this message

Stuart (stuartneilson) wrote on 2010-04-27:

#17

I am still having this issue in a fresh install of the release candidate of Ubuntu 10.04, using the default kernel. Both the root partition and separate home partition are remounted readonly, so pretty much everything is disabled (including the ability to save any log data).

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-04-28:

#18

Thanks for your update, Stuart. In trying to save log files when the filesystem goes read-only, I've found it helpful to use either a USB key or SD Card, which can be mounted even when the boot filesystem has gone read-only. In most cases when testing for this problem, I even mount an SD Card as soon as my system boots so I can immediately capture the data when the (inevitable) corruption happens. You can then do something like "dmesg > /dev/sdc1/dmesg.txt" to capture data about the event, unmount the USB key, reboot and submit an update to this bug report.

You mentioned the RC; I'm guessing you're using kernel 2.6.32-21. We've been asked to also try using the "latest" upstream kernel. I've already tried the 2.6.33 series, which seems to solve the problem, but 2.6.34 is also available. Can you please help by getting an upstream kernel (either 2.6.33.3-lucid or 2.6.34-rc5-lucid) from here: http://kernel.ubuntu.com/~kernel-ppa/mainline . Full instructions and complete information about the process are at https://wiki.ubuntu.com/KernelMainlineBuilds .

You have my personal thanks for continuing to help on this problem, since this corruption problem is extremely frustrating!

Revision history for this message

Stuart (stuartneilson) wrote on 2010-04-29:

#19

I have tried inserting a USB memory stick, but it seems that the system is unable to mount it once the root partition has been remounted readonly. Is there any command to remount the filesystem rw, at my own risk?

I also collected some possibly duplicate launchpad bugs here http://www.iol.ie/~stuartneilson/Bootup_fsck.html

and I will try the upstream kernel later on today.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-04-30:

#20

I would strongly advise against trying to force your primary filesystem to remount writeable, especially since it may introduce additional corruption that will make this bug more difficult to troubleshoot. Filesystem bugs like this one are quite difficult to troubleshoot because, for instance, this bug is appearing on a very common hard drive controller and chipset, but it appears on a very small subset of the total number of devices with that chipset.

You can manually mount your USB stick using something like "sudo mount -t vfat /dev/sdc1 /mnt" (replacing /dev/sdc1 with the correct location for your particular setup). You can find out where your system sees your USB filesystem by first using "sudo fdisk -l". There's a more comprehensive wiki page available in Ubuntu Community Help: https://help.ubuntu.com/community/Mount/USB .

Also, thanks for the work on the possible duplicates. That list may help the developers, although I note there are problems reported with Ubuntu 9.04, which never caused problems for me.

Revision history for this message

Ben Regenspan (bregenspan) wrote on 2010-05-02:

#21

I'm experiencing what appears to be the same issue, on a Dell XPS M1330. So far the problem appears to be fixed running http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.34-rc6-lucid/

Revision history for this message

Stuart (stuartneilson) wrote on 2010-05-05:

#22

This message clearly does not apply directly to all subscribers to this bug, because some have NVidia graphics. However I noticed that I had an error in dmesg "[drm:rs400_gart_adjust_size] *ERROR* Forcing to 32M GART size (because of ASIC bug ?)" and that this is solved in Bug #562843 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/562843

The solution to Bug #562843 is to add "radeon.modeset=0" to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and then run sudo update-grub2.

I have repeatedly suspended and resumed, by closing the lid and from the menu, without a single failure to resume since applying this boot option. (That may be just luck - I will post when my system freezes or fails to resume).

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-05-08:

#23

Running kernel 2.6.34-020634rc6-generic (mainline 2.6.34-rc6-lucid) I am experiencing exactly the same behavior as I previously reported in comment #7. It seems clear that the problem was fixed in 2.6.33.

tags:

removed: needs-upstream-testing

Revision history for this message

Stuart (stuartneilson) wrote on 2010-05-10:

#24

The grub option "radeon.modeset=0" does not prevent X freezes. My system appears to last much longer between freezes / failures to resume, but still does so.

I can repeatably cause the machine to freeze by mounting cifs locations - the computer will freeze shortly into use after the next resume following the mount operation.

High system load, e.g. generating md5 checksums on a CD image while copying large files, will also cause an X freeze.

In all cases the keyboard and mouse appear unresponsive, caps lock does not alter the LED state, but Alt SysReq REISUB will reboot.

Revision history for this message

Stuart (stuartneilson) wrote on 2010-05-20:

#25

Bugzilla bug #14543 describes a very similar issue involving ata flushing which is RESOLVED and available in patch "libata: retry failed FLUSH if device didn't fail it" which is applied from kernel 2.6.33-rc1
http://mirror.celinuxforum.org/gitstat//commit-detail.php?commit=6013efd8860bf15c1f86f365332642cfe557152f

The bug report https://bugzilla.kernel.org/show_bug.cgi?id=14543 also has a script that repeatably creates the read-only effect, but I can not get this script to create the effect here - can anyone create a script that reliably replicates this fault?

Comment #7 From Andrey Vihrov 2009-11-05 09:43:18 -------

Created an attachment (id=23658) [details]
dmesg with ext4 failure

Yes. I was able to trigger it by running two scripts in parallel:

while true; do
    echo > test
    && sync
    && rm test
    && sync
    || break ;
done

and

while true; do
for FILE in /sys/class/scsi_host/*/link_power_management_policy; do echo
"min_power" > ${FILE} && echo "max_performance" > ${FILE}; done
&& sleep 1 ;
done

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-05-20:

#26

No further patches are needed regarding the reported bug. Kernel 2.6.33 has been released and does not experience the bug described in this report. Kernel 2.6.34 is also released (and queued for inclusion in Maverick) and does not experience the bug described in this report.

You can use the mainline 2.6.33 kernel (which is now in its fourth point-revision) by downloading the appropriate files here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33.4-lucid/

You can use the mainline 2.6.34 kernel (which has had no point-revisions) by downloading the appropriate files here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.34-lucid/

For instructions on how to determine which files you need and how to install them, visit:
https://wiki.ubuntu.com/KernelTeam/MainlineBuilds

These are current as of the time I'm entering this comment, but you may wish to check the Mainline Kernel PPA for newer point-revisions so you always have the latest mainline kernel.

Revision history for this message

chastell (chastell) wrote on 2010-05-21:

#27

Thanks a lot, Scott, for your through summary.

Is there a good way to install Lucid with the mainline kernels without ever booting with kernels affected by this issue, or would this require a custom install image (with one of the mainline kernels on the installation medium, and one being able to install the mainline kernel before rebooting into the installed system)?

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-05-21:

#28

Everyone is welcome to the fruits of my pain with this problem. I was hoping to get the problem solved before Lucid so I could use a completely standard Lucid install, but my hopes were dashed and I have to target Maverick instead.

The current stock Lucid kernel is just stable enough on my system that I am able to perform an installation as follows:

1) Download the mainline kernel files (you will need 3 for your system) and put them on some kind of media that you can access without a graphical system.

2) Use the Alternate Install CD to install a Command Line Only system.

3) Reboot after installation, login to your new system, and IMMEDIATELY mount the media with your mainline kernel. The easiest way to do this is something like "sudo mount /dev/sdc1 /mnt" (and replace sdc1 with the location of your media).

4) Install the mainline kernel: "cd /mnt" and then "sudo dpkg -i *.deb"

5) Reboot the system.

5a) If you know how to do it, edit your /etc/apt/sources.list to enable the CD-ROM (or use apt-cdrom). Otherwise, you need a decently fast Internet connection from here on.

6) Now install the full system, using whatever method you prefer. Tasksel has been bombing out on me, so I would recommend either "sudo aptitude install kubuntu-desktop" or "sudo apt-get install kubuntu-desktop". Replace kubuntu-desktop with the variant of Ubuntu you prefer.

7) The mainline kernel will automatically be the first kernel that GRUB sees, so you will always boot into mainline unless you hold down the Shift key at boot time and manually select another kernel.

This method is not perfect, but it DOES give me a working Lucid system. I had to skip Karmic completely because the mainline kernels at the time didn't work for me. The biggest drawback is that this method pretty much requires a fast Internet connection, so if you don't have one then this is not the method for you.

A piece of advice: if you get the read-only filesystem indicating corruption at any point before you boot into the mainline kernel, you should probably go ahead and reinstall rather than wasting time by trying to fix the broken installation.

You should be aware that the mainline kernel still has issues with Intel 852/855 video, so if you have this chipset you can still expect frequent hard lockups, but fortunately your entire filesystem will not be corrupted any more when that happens. Don't blame Ubuntu for these problems though, because Ubuntu appears to have about the only reasonably functioning solution of any major distro at the moment. The Intel video problem is an upstream headache, and they are beating their heads against brick walls trying to solve it. More information is in Bug 541511.

You should also be aware that the Lucid version of ndiswrapper does not work with 2.6.33 and later kernels, but this has been solved for Maverick. If you need ndiswrapper for any reason, then the mainline kernel may cause heartache for you. More information is in Bug 582555.

Everyone is welcome to the fruits of my pain with this problem.  I was hoping to get the problem solved before Lucid so I could use a completely standard Lucid install, but my hopes were dashed and I have to target Maverick instead.

The current stock Lucid kernel is just stable enough on my system that I am able to perform an installation as follows:

1) Download the mainline kernel files (you will need 3 for your system) and put them on some kind of media that you can access without a graphical system.

2) Use the Alternate Install CD to install a Command Line Only system.

3) Reboot after installation, login to your new system, and IMMEDIATELY mount the media with your mainline kernel.  The easiest way to do this is something like "sudo mount /dev/sdc1 /mnt" (and replace sdc1 with the location of your media).

4) Install the mainline kernel:  "cd /mnt" and then "sudo dpkg -i *.deb"

5) Reboot the system.

5a) If you know how to do it, edit your /etc/apt/sources.list to enable the CD-ROM (or use apt-cdrom).  Otherwise, you need a decently fast Internet connection from here on.

6) Now install the full system, using whatever method you prefer.  Tasksel has been bombing out on me, so I would recommend either "sudo aptitude install kubuntu-desktop" or "sudo apt-get install kubuntu-desktop".  Replace kubuntu-desktop with the variant of Ubuntu you prefer.

7) The mainline kernel will automatically be the first kernel that GRUB sees, so you will always boot into mainline unless you hold down the Shift key at boot time and manually select another kernel.

This method is not perfect, but it DOES give me a working Lucid system.  I had to skip Karmic completely because the mainline kernels at the time didn't work for me.  The biggest drawback is that this method pretty much requires a fast Internet connection, so if you don't have one then this is not the method for you.

A piece of advice:  if you get the read-only filesystem indicating corruption at any point before you boot into the mainline kernel, you should probably go ahead and reinstall rather than wasting time by trying to fix the broken installation.

You should be aware that the mainline kernel still has issues with Intel 852/855 video, so if you have this chipset you can still expect frequent hard lockups, but fortunately your entire filesystem will not be corrupted any more when that happens.  Don't blame Ubuntu for these problems though, because Ubuntu appears to have about the only reasonably functioning solution of any major distro at the moment.  The Intel video problem is an upstream headache, and they are beating their heads against brick walls trying to solve it.  More information is in Bug 541511.

You should also be aware that the Lucid version of ndiswrapper does not work with 2.6.33 and later kernels, but this has been solved for Maverick.  If you need ndiswrapper for any reason, then the mainline kernel may cause heartache for you.  More information is in Bug 582555.

Revision history for this message

Dr. Burnett (cortezb3) wrote on 2010-05-29:

#29

I am having the same (read-only file system) problem after wake from sleep triggered by opening the lid on the laptop. I have included all the useful information I gathered from logs. I will also, as previously described, attempt to capture more relevant data if/when phenomenon occurs again.

##mount ->
/dev/sda5 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
none on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /dev type devtmpfs (rw,mode=0755)
none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
none on /dev/shm type tmpfs (rw,nosuid,nodev)
none on /var/run type tmpfs (rw,nosuid,mode=0755)
none on /var/lock type tmpfs (rw,noexec,nosuid,nodev)
none on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
none on /var/lib/ureadahead/debugfs type debugfs (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
/home/burnett/.Private on /home/burnett type ecryptfs (ecryptfs_sig=848fa9a5a2c8207f,ecryptfs_fnek_sig=28ea3f06089bd9f2,ecryptfs_cipher=aes,ecryptfs_key_bytes=16)
gvfs-fuse-daemon on /home/burnett/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=burnett)

2.6.32-22-generic (64-bit)
Ubuntu 10.04 LTS

##hdparm -i /dev/sda ->
Model=Hitachi, FwRev=PB4OC60F, SerialNo=091117PB4406Q7CRZLUL
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=7208kB, MaxMultSect=16, MultSect=off
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=yes: mode=0xFE (254) WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-2,3,4,5,6,7

* signifies the current active mode

I am having the same (read-only file system) problem after wake from sleep triggered by opening the lid on the laptop. I have included all the useful information I gathered from logs.  I will also, as previously described, attempt to capture more  relevant data if/when phenomenon occurs again.

##mount ->
/dev/sda5 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
none on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /dev type devtmpfs (rw,mode=0755)
none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
none on /dev/shm type tmpfs (rw,nosuid,nodev)
none on /var/run type tmpfs (rw,nosuid,mode=0755)
none on /var/lock type tmpfs (rw,noexec,nosuid,nodev)
none on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
none on /var/lib/ureadahead/debugfs type debugfs (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
/home/burnett/.Private on /home/burnett type ecryptfs (ecryptfs_sig=848fa9a5a2c8207f,ecryptfs_fnek_sig=28ea3f06089bd9f2,ecryptfs_cipher=aes,ecryptfs_key_bytes=16)
gvfs-fuse-daemon on /home/burnett/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=burnett)

2.6.32-22-generic (64-bit)
Ubuntu 10.04 LTS

##hdparm -i /dev/sda ->
 Model=Hitachi, FwRev=PB4OC60F, SerialNo=091117PB4406Q7CRZLUL
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=7208kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=yes: mode=0xFE (254) WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-2,3,4,5,6,7

* signifies the current active mode

Revision history for this message

Dr. Burnett (cortezb3) wrote on 2010-05-29:

#30

Just saw Scott's post and I am applying it now. Will report any further problems.
Thanks.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-05-29:

#31

Dr. Burnett:
Please note that this bug report does not involve a wake from sleep trigger. The corruption events reported here are spontaneous, with the system up and running.

If your filesystem is being corrupted on wake from sleep, please file a new bug report with the circumstances of the different bug you have found.

Revision history for this message

Dr. Burnett (cortezb3) wrote on 2010-06-01:

#32

Thanks Scott. I will check to see if a bug report has already been submitted on this issue. It appears that the kernel upgrade was successful in stabilizing my system, so thanks anyway.

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-06-02:

#33

I would like to report that I've been using the Ubuntu kernel 2.6.34-52.1 from Brian Rogers's PPA located at

https://launchpad.net/~brian-rogers/+archive/graphics-fixes

with considerable success. For those who use an i852/855 chipset and are having problems with the Intel graphics, in addition to the filesystem corruption problem, this is worth a try.

Although the updated kernel is not 100% stable, the increase in stability over even the mainline 2.6.34 kernel is quite welcome. Now not only are video-related crashes much less frequent for me, but when the system does crash, I don't lose the entire filesystem.

Instructions for installing are at the above link, with the usual caveat that this method is completely and utterly unsupported by Ubuntu, Canonical, and probably even Brian Rogers, so beware that using the PPA kernel could result in your entire system turning into a block of semi-sentient Swiss cheese and walking away to take a Carnival cruise.

Jeremy Foshee (jeremyfoshee) on 2010-06-02

tags:	added: kernel-fs kernel-needs-review
Changed in linux (Ubuntu):
importance:	Undecided → High
status:	Incomplete → Triaged

Andy Whitcroft (apw) on 2010-06-03

tags:

added: kernel-candidate kernel-reviewed
removed: kernel-needs-review

Revision history for this message

Andy Whitcroft (apw) wrote on 2010-06-03:

#34

Ok I have pulled back the patch which was suggested in comment #25, and built some test kernels. If anyone is able to test these and confirm whether this fix is indeed the solution that would be helpful. Of course there is significant risk it is not, so don't do it with your favourite data. If you could test the kernels at the URL below and report back here that would be helpful:

http://people.canonical.com/~apw/lp528981-lucid/

Thanks!

Changed in linux (Ubuntu):
assignee:	nobody → Andy Whitcroft (apw)
status:	Triaged → Incomplete
Changed in linux (Ubuntu Lucid):
status:	New → Incomplete
importance:	Undecided → High
assignee:	nobody → Andy Whitcroft (apw)
Changed in linux (Ubuntu):
status:	Incomplete → Fix Released

Revision history for this message

Andy Whitcroft (apw) wrote on 2010-06-03:

#35

As this issue is not seen in 2.6.34 I am marking this Fix Released for Maverick. Lucid remains open.

Revision history for this message

Stuart (stuartneilson) wrote on 2010-06-03:

#36

Andy Whitcroft, if I understood message #34 correctly, then the kernel 2.6.32-22 contains the libata patch. I am running with this kernel now in Lucid on a laptop previously affected by this bug. I will let you know after 24 hours if it is still running.

I was running kernel 2.6.34 previously and that has been stable since May 19th, without a single freeze, resume failure, corrupted file or fsck on reboot. A version of 2.6.33 has been running the same duration on an identical laptop with Karmic, also without any failure.

Revision history for this message

Stuart (stuartneilson) wrote on 2010-06-03:

#37

Just 50 minutes in and I suspended (to go to bed) and just tried testing resume, various parts of the desktop failed (Network Manager for one) and the filesystem was readonly. Attempting to open a terminal locked it completely - the Caps Lock light did not function, but Alt Sysreq REISUB rebooted, providing the same old fsck message (with no files in /lost+found):

[ 3.133565] EXT4-fs (sda4): INFO: recovery required on readonly filesystem
[ 3.133573] EXT4-fs (sda4): write access will be enabled during recovery
[ 3.909630] EXT4-fs (sda4): orphan cleanup on readonly fs
...
[ 3.909731] EXT4-fs (sda4): 6 orphan inodes deleted
[ 3.909735] EXT4-fs (sda4): recovery complete
[ 4.515465] EXT4-fs (sda4): mounted filesystem with ordered data mode

Revision history for this message

Stuart (stuartneilson) wrote on 2010-06-04:

#38

... and the updated kernel 2.6.32-22-generic pushed out by update today caused a freeze, after which I had to manually correct the filesystem (twice). The first run reported empty indoes in /tmp/stuart-orbit/ and now, for the first time, I have some recovered inodes:

ls -al /lost+found/
total 20
drwx------ 2 root root 16384 2010-04-29 23:25 .
drwxr-xr-x 22 root root 4096 2010-05-20 07:50 ..
srwxr-xr-x 1 stuart stuart 0 2010-06-04 12:00 #519190
srwxr-xr-x 1 stuart stuart 0 2010-06-04 12:00 #519199
srwxr-xr-x 1 stuart stuart 0 2010-06-04 12:00 #519211

So no, neither the kernel posted here last night nor the 2.6.32-22 pushed by Update functioned correctly on my system. Kernel 2.6.34 (Lucid) and 2.6.33 (Karmic) have both run from 19 May to 3 June without an error.

Revision history for this message

Grant Likely (glikely) wrote on 2010-06-07:

#39

Hi Andy,

I'm also seeing this issue on my MacBook 2,1 with an Intel SSD. Typically it will go for long periods of time (days) without a problem before seeing the failure. Often times it is when switching to a different commit on my Linux-2.6 git tree that the failure will occur. I've now switched to Brian Rodgers linux-2.6.34-v9patch-generic kernel to see if that helps things.

I would like to run the test case shown in comment #25, but the /sys/class/scsi_host/*/link_power_management_policy control file is not present on my system. Does anyone know if there any other known test cases to reproduce this problem?

g.

Revision history for this message

Stuart (stuartneilson) wrote on 2010-06-16:

#40

Download full text (4.2 KiB)

I hope that this is helpful. My computer is running well with kernel 2.6.34 using Ubuntu Lucid 10.04. My computer freezes if I boot using kernel 2.6.32-22.

I am able to keep my computer functioning after the events causing a freeze by mounting my root filesystem using the option data=writeback instead of the default data=ordered. It is necessary to modify /etc/fstab, /etc/defaults/grub and to run tune2fs (according to http://www.goitexpert.com/general/ubuntuguide/) to mount the root filesystem using non-default options.

I am currently posting from a computer that appears to be "half-frozen" so I can post any further information that is of interest. Thunderbird, several terminals and some inodes are locked.

Once I boot using data=writeback, a process freeze produces the following messages, and I suspect that without data=writeback the system would be frozen:

[ 672.286366] EXT4-fs warning (device sda4): dx_probe: dx entry: limit != root limit
[ 672.286375] EXT4-fs warning (device sda4): dx_probe: Corrupt dir inode 308015, running e2fsck is recommended.
[ 672.286405] BUG: unable to handle kernel paging request at f76f5006
[ 672.286411] IP: [<c029113d>] ext4_find_entry+0x14d/0x410
[ 672.286424] *pde = 00007067 *pte = 77520002
[ 672.286429] Oops: 0000 [#1] SMP
[ 672.286433] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:00/power_supply/BAT1/charge_full
[ 672.286439] Modules linked in: aes_i586 aes_generic binfmt_misc ppdev snd_hda_codec_idt joydev fbcon tileblit font bitblit softcursor vga16fb vgastate snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss arc4 snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer radeon snd_seq_device b43 ttm mac80211 drm_kms_helper snd cfg80211 drm i2c_algo_bit dell_wmi ati_agp soundcore sdhci_pci sdhci dell_laptop dcdbas led_class ricoh_mmc i2c_piix4 snd_page_alloc shpchp agpgart k8temp psmouse serio_raw video output lp parport b44 mii ssb pata_atiixp ahci
[ 672.286499]
[ 672.286504] Pid: 2781, comm: thunderbird-bin Not tainted (2.6.32-22-generic #36-Ubuntu) Inspiron 1501
[ 672.286509] EIP: 0060:[<c029113d>] EFLAGS: 00010216 CPU: 0
[ 672.286515] EIP is at ext4_find_entry+0x14d/0x410
[ 672.286519] EAX: f76f6000 EBX: f76f5000 ECX: 0000000c EDX: 00003000
[ 672.286523] ESI: f76f500d EDI: f6f0d580 EBP: f1a15dc4 ESP: f1a15d3c
[ 672.286527] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 672.286532] Process thunderbird-bin (pid: 2781, ti=f1a14000 task=f2ede680 task.ti=f1a14000)
[ 672.286536] Stack:
[ 672.286538] 00000000 f1a15db0 00000202 c01cc223 00000000 00003000 f6f70be0 f1a15ddc
[ 672.286546] <0> f23b78dc f68e5800 f23b78a0 00000000 00000003 f6f0d340 00000004 f6f70c68
[ 672.286554] <0> 00001000 00000005 00000005 00000003 0000000d f478f080 f6f0d580 f6f0d340
[ 672.286563] Call Trace:
[ 672.286570] [<c01cc223>] ? mempool_free_slab+0x13/0x20
[ 672.286577] [<c0291445>] ? ext4_lookup+0x45/0x100
[ 672.286582] [<c058b6fd>] ? _spin_lock+0xd/0x10
[ 672.286588] [<c021aeeb>] ? d_alloc+0x13b/0x190
[ 672.286594] [<c0210817>] ? real_lookup+0xb7/0x110
[ 672.286599] [<c0212265>] ? do_lookup+0x95/0xc0
[ 672.286605] [<c016ed2...

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Fix Released	High	Andy Whitcroft
	Lucid	Won't Fix	High	Unassigned

Ubuntu
linux package

Repetitive massive filesystem corruption

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Changed in linux (Ubuntu Lucid):
assignee:	Andy Whitcroft (apw) → nobody

Changed in linux (Ubuntu Lucid):
status:	Incomplete → Confirmed

Changed in linux (Ubuntu Lucid):
status:	Confirmed → Won't Fix

Ubuntulinux package

Repetitive massive filesystem corruption

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package