in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

Bug #453579 reported by Steve Langasek
408
This bug affects 25 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Undecided
Unassigned
Release Notes for Ubuntu
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Invalid
Critical
Surbhi Palande
Nominated for Jaunty by r12056
Nominated for Lucid by r12056
Karmic
Invalid
Critical
Unassigned

Bug Description

There are worrying reports of filesystem corruption on ext4 in karmic. Scott says:

12:36 < Keybuk> this whole ext4 thing is worrying me
12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in /var/cache/apt/archives
12:59 < Keybuk> which seems to imply its any file large enough to use lots of extents

I'm opening this bug report so that this bug gets tracked & triaged for karmic. If we're unable to isolate the issue, we should consider rolling back to ext3 as the default filesystem in the installer.

ProblemType: Bug
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: vorlon 3350 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xee240000 irq 17'
   Mixer name : 'Analog Devices AD1981'
   Components : 'HDA:11d41981,17aa2025,00100200'
   Controls : 20
   Simple ctrls : 11
Date: Fri Oct 16 16:01:26 2009
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=f108133c-6b9d-4d28-9058-0b3a0c5549b4
MachineType: LENOVO 6371CTO
Package: linux-image-2.6.31-14-generic 2.6.31-14.46
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: root=/dev/mapper/hostname-root ro
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-13.44-generic
RelatedPackageVersions: linux-firmware 1.22
SourcePackage: linux
Uname: Linux 2.6.31-13-generic x86_64
WpaSupplicantLog:

dmi.bios.date: 12/27/2006
dmi.bios.vendor: LENOVO
dmi.bios.version: 7IET23WW (1.04 )
dmi.board.name: 6371CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7IET23WW(1.04):bd12/27/2006:svnLENOVO:pn6371CTO:pvrThinkPadT60:rvnLENOVO:rn6371CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 6371CTO
dmi.product.version: ThinkPad T60
dmi.sys.vendor: LENOVO

ls

Revision history for this message
Steve Langasek (vorlon) wrote :
Changed in linux (Ubuntu):
importance: Undecided → Critical
milestone: none → ubuntu-9.10
Revision history for this message
Steve Langasek (vorlon) wrote :

There are several open bug reports upstream regarding ext4 corruption, but it's not clear which, if any, are related to the problems being observed.

http://bugzilla.kernel.org/show_bug.cgi?id=14354 is one bug that appears to be linked to the use of the DM layer - if you're following up to this bug report, please indicate whether your ext4 fs is sitting on top of a dm-crypt, LVM, or RAID device.

That bug also mentions using auto_da_alloc=0 as a boot option to work around; we should check whether that boot option makes a difference for users seeing this bug.

Changed in linux (Ubuntu Karmic):
status: New → Triaged
Revision history for this message
Steve Beattie (sbeattie) wrote : apport-collect data

AplayDevices: aplay: device_list:223: no soundcards found...
Architecture: amd64
ArecordDevices: arecord: device_list:223: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/dsp', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D0c', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer', '/dev/sequencer2', '/dev/sequencer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CheckboxSubmission: 138b721e3738d95476954739cfd660dd
CheckboxSystem: 558fbfb2a1258711a37bb7e23c5d4e6e
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=15325e81-9f2d-4102-9742-a1a76b888317
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: Shuttle Inc SA76
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-14-generic root=/dev/mapper/alyosha1-karmic_test ro quiet splash
ProcEnviron:
 SHELL=bash
 PATH=(custom, user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-14-generic N/A
 linux-firmware 1.23
RfKill:

Uname: Linux 2.6.31-14-generic x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
dmi.bios.date: 05/04/2009
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: FA76
dmi.board.vendor: Shuttle Inc
dmi.board.version: V10
dmi.chassis.type: 3
dmi.chassis.vendor: Shuttle Inc
dmi.chassis.version: G2
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd05/04/2009:svnShuttleInc:pnSA76:pvrV10:rvnShuttleInc:rnFA76:rvrV10:cvnShuttleInc:ct3:cvrG2:
dmi.product.name: SA76
dmi.product.version: V10
dmi.sys.vendor: Shuttle Inc

Revision history for this message
Steve Beattie (sbeattie) wrote : AlsaDevices.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : BootDmesg.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Card0.Amixer.info.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Card0.Amixer.values.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Card0.Codecs.codec.0.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : CurrentDmesg.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Lspci.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Lsusb.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : PciMultimedia.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : ProcCpuinfo.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : ProcInterrupts.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : ProcModules.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : UdevDb.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : UdevLog.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : WifiSyslog.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : XsessionErrors.txt
tags: added: apport-collected
Revision history for this message
Steve Beattie (sbeattie) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I did a fresh install from the karmic alt amd64 cd build 20091016 onto ext4 on LVM. Post install update, and installation of a limited amount of additional software, I ran a debsums -a on the system, and noticed the following things:

- debsums claims that the following packages don't have an md5sums at all: bogofilter,g++,binutils, installation-report, libgdbm3, liblockfile1, lockfile-progs, mawk, netbase, update-inetd, xorg,xserver-xorg-input-all, and xserver-xorg-video-all. All of these are supposed to, though 5 are 0 length, but they're all missing from the ext4/lvm install:

- the following files were reported as failing their debsums check:

/var/lib/gdm/.gconf.defaults/%gconf-tree.xml FAILED
/usr/share/applications/gpilotd-control-applet.desktop FAILED
/var/lib/openoffice/basis3.1/share/config/javasettingsunopkginstall.xml FAILED

the last is expected (I believe) but not the first two.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

On Sat, Oct 17, 2009 at 05:08:49PM -0000, Steve Beattie wrote:
> - the following files were reported as failing their debsums check:

> /var/lib/gdm/.gconf.defaults/%gconf-tree.xml FAILED
> /usr/share/applications/gpilotd-control-applet.desktop FAILED
> /var/lib/openoffice/basis3.1/share/config/javasettingsunopkginstall.xml FAILED

> the last is expected (I believe) but not the first two.

The second is a bug in gnome-pilot, I guess it hasn't been rebuilt since we
fixed the translations-stripped-after-debsums problem.

The first could have any number of other explanations besides filesystem
corruption.

The missing .md5sums files are interesting/worrying, though.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Colin Watson (cjwatson) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I don't think the missing .md5sums files are intrinsically worrying. I've looked at several of them and they're genuinely missing. installation-report, for instance (for which I have the source to hand), just doesn't call dh_md5sums, and the same is true for a number of the other packages in the list.

Since md5sums files are created in debian/rules rather than by dpkg-deb, they're merely very widespread rather than actually universal ...

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I'm just using plain old ext4 on SSD

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Here's an example of what I mean:

warcraft scott% wget -q http://cdimages.ubuntu.com/ubuntu-moblin-remix/daily-live/current/karmic-moblin-remix-i386.iso
warcraft scott% md5sum karmic-moblin-remix-i386.iso
91e4f415767a45617f0cbfc5b0abd19c karmic-moblin-remix-i386.iso
warcraft root# sync
warcraft scott% md5sum karmic-moblin-remix-i386.iso
26c3177ae594a3713b0e318e12e91e1b karmic-moblin-remix-i386.iso

I assume the change is that the file is no longer in the page cache

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : apport-collect data

Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: scott 1791 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf6ffc000 irq 21'
   Mixer name : 'SigmaTel STAC9228'
   Components : 'HDA:83847616,10280209,00100201'
   Controls : 29
   Simple ctrls : 19
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=4e4e4aa8-4e55-432a-a36c-2a4d1cc71f49
MachineType: Dell Inc. XPS M1330
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-14-generic root=UUID=bc91769a-258e-4e9a-89e1-e7fcb36520d7 ro quiet
ProcEnviron:
 LANG=en_GB.UTF-8
 PATH=(custom, user)
 SHELL=/bin/zsh
 LC_COLLATE=C
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-14-generic N/A
 linux-firmware 1.24
Uname: Linux 2.6.31-14-generic x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
dmi.bios.date: 12/26/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.name: 0U8042
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd12/26/2008:svnDellInc.:pnXPSM1330:pvr:rvnDellInc.:rn0U8042:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: XPS M1330
dmi.sys.vendor: Dell Inc.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : AlsaDevices.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : AplayDevices.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : ArecordDevices.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : BootDmesg.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Card0.Amixer.values.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Card0.Codecs.codec.0.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : CurrentDmesg.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : IwConfig.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Lspci.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Lsusb.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : PciMultimedia.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : ProcCpuinfo.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : ProcInterrupts.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : ProcModules.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : RfKill.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : UdevDb.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : UdevLog.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : WifiSyslog.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : XsessionErrors.txt
Revision history for this message
Nick Lowe (nick-int-r) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

http://bugzilla.kernel.org/attachment.cgi?id=23458

--- inode.c.orig 2009-10-05 18:18:51.000000000 +0200
+++ inode.c 2009-10-18 13:16:45.728112813 +0200
@@ -5164,6 +5164,9 @@
  } else {
   struct ext4_iloc iloc;

+ if (inode->i_sb->s_flags & MS_RDONLY)
+ return 0;
+
   err = ext4_get_inode_loc(inode, &iloc);
   if (err)
    return err;

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Nick: what was the context of that Bugzilla reference, there's no bug# included

Revision history for this message
Eamonn Sullivan (eamonn-sullivan) wrote :

(In case Nick doesn't respond quickly) The patch is referenced in http://bugzilla.kernel.org/show_bug.cgi?id=14354

Near the end.

Revision history for this message
Tomás Reyes (trcecilio) wrote :

The reference to that patch is in Comment #90

http://bugzilla.kernel.org/show_bug.cgi?id=14354#c90

Revision history for this message
John Johansen (jjohansen) wrote :

The code path that is being patched in the ext4_write_inode() function is new to 2.6.32 and does not exist in Karmic.

It may be possible (though unlikely), that the read only non-journaled case calling ext4_force_commit is causing the corruption as in the 2.6.32 patch this case is short circuited returning without doing anything, however in 2.6.32 this code path is short circuiting on sync_dirty_buffers() not ext4_force_commit.

I have attached the patch reworked for 2.6.31 short circuiting the read only non-journaled case but I need to evaluate the code more.

Revision history for this message
John Johansen (jjohansen) wrote :

Since I haven't reproduced this error yet I would like to get a better handle on what people are seeing here. Is it file system corruption (errors that show up in fsck), or file corruption where fsck does not report any errors. Also for files that are corrupted do they have the correct size and is possible to run a compare between a corrupt file and a good file so we can get a handle on the location the corruption is starts.

Revision history for this message
John Johansen (jjohansen) wrote :

I have placed a test kernel with the above patch at

http://kernel.ubuntu.com/~jj/linux-image-2.6.31-14-generic_2.6.31-14.48~ext4test1_amd64.deb

It would be good to know if this clears up the corruption problems, and/or if the warning and stack trace shows up in the logs whether or not the corruption is problem is fixed.

Revision history for this message
papukaija (papukaija) wrote :

Just to confirm, is this just an issue with 2.6.31-14.46 kernel or is it also with the 2.6.31-14.48 kernel?

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

On Wed, 2009-10-21 at 20:07 +0000, John Johansen wrote:

> Since I haven't reproduced this error yet I would like to get a better
> handle on what people are seeing here. Is it file system corruption
> (errors that show up in fsck), or file corruption where fsck does not
> report any errors. Also for files that are corrupted do they have the
> correct size and is possible to run a compare between a corrupt file and
> a good file so we can get a handle on the location the corruption is
> starts.
>
The corruption is not detected by fsck.

In my testing, the files maintained the same size, but the data changed
in them. The data started being different at around 512MB into the
file.

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Carl Englund (englundc) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I'm a litte worried about this one too, so I gave testing it a shot. Created a ~20GiB ext4 filesystem and copied a 1.2GiB file there. Compared with md5sum and the checksum was the same. Running Karmic RC with 2.6.31-14.

Revision history for this message
mabovo (mabovo) wrote :

Sory if I hijack this bug but seems that there is something related if I am not totally wrong:
I am using 9.10 with Mac2,1, ext4 on sda3.
When trying to copy a DVD like Snow or Leopard.iso (aprox. 7.5 GB) into an external HD *fat32), Nautilus stop the process in the middle displaying an error message with the following "Error writing the file: File too big"

The same iso can be copied into another partition of my internal hd like sda4 (NTFS/Windows7) without errors.

Revision history for this message
mabovo (mabovo) wrote : apport-collect data

Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mabovo 2942 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0x90440000 irq 22'
   Mixer name : 'SigmaTel STAC9221 A1'
   Components : 'HDA:83847680,106b2200,00103401'
   Controls : 21
   Simple ctrls : 13
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=720ad8ab-7e45-4167-8bdf-93b289eb6e50
MachineType: Apple Inc. MacBook2,1
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-14-generic root=UUID=65607e5f-473e-416d-b35b-4012db3c6f36 ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 LANG=pt_BR.UTF-8
 LANGUAGE=pt_BR.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-14-generic N/A
 linux-firmware 1.24
Uname: Linux 2.6.31-14-generic i686
UserGroups: adm admin cdrom dialout lpadmin netdev plugdev sambashare
dmi.bios.date: 06/27/07
dmi.bios.vendor: Apple Inc.
dmi.bios.version: MB21.88Z.00A5.B07.0706270922
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: Mac-F4208CAA
dmi.board.vendor: Apple Inc.
dmi.board.version: PVT
dmi.chassis.asset.tag: Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-F4208CAA
dmi.modalias: dmi:bvnAppleInc.:bvrMB21.88Z.00A5.B07.0706270922:bd06/27/07:svnAppleInc.:pnMacBook2,1:pvr1.0:rvnAppleInc.:rnMac-F4208CAA:rvrPVT:cvnAppleInc.:ct10:cvrMac-F4208CAA:
dmi.product.name: MacBook2,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.

Revision history for this message
mabovo (mabovo) wrote : AlsaDevices.txt
Revision history for this message
mabovo (mabovo) wrote : AplayDevices.txt
Revision history for this message
mabovo (mabovo) wrote : ArecordDevices.txt
Revision history for this message
mabovo (mabovo) wrote : BootDmesg.txt
Revision history for this message
mabovo (mabovo) wrote : Card0.Amixer.values.txt
Revision history for this message
mabovo (mabovo) wrote : Card0.Codecs.codec.0.txt
Revision history for this message
mabovo (mabovo) wrote : CurrentDmesg.txt
Revision history for this message
mabovo (mabovo) wrote : IwConfig.txt
Revision history for this message
mabovo (mabovo) wrote : Lspci.txt
Revision history for this message
mabovo (mabovo) wrote : Lsusb.txt
Revision history for this message
mabovo (mabovo) wrote : PciMultimedia.txt
Revision history for this message
mabovo (mabovo) wrote : ProcCpuinfo.txt
Revision history for this message
mabovo (mabovo) wrote : ProcInterrupts.txt
Revision history for this message
mabovo (mabovo) wrote : ProcModules.txt
Revision history for this message
mabovo (mabovo) wrote : RfKill.txt
Revision history for this message
mabovo (mabovo) wrote : UdevDb.txt
Revision history for this message
mabovo (mabovo) wrote : UdevLog.txt
Revision history for this message
mabovo (mabovo) wrote : WifiSyslog.txt
Revision history for this message
mabovo (mabovo) wrote : XsessionErrors.txt
Revision history for this message
Steve Langasek (vorlon) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

mabovo, your issue is completely unrelated.

Steve Langasek (vorlon)
Changed in linux (Ubuntu Karmic):
milestone: ubuntu-9.10 → karmic-updates
Revision history for this message
Manuel Bua (manuel-bua) wrote :

Since it seems the fix has been planned for karmic-updates, should we expect ext3 to be used as the default fs when installing Karmic?
I'm quite worried about the impact this bug could have on new users migrating to Ubuntu.

Revision history for this message
Steve Langasek (vorlon) wrote :

Setting the milestone does not mean that a fix is planned for karmic-updates. So far, it doesn't appear that Scott's original problem is reproducible for anyone else. We will stay on this bug to try to confirm it and find a fix, but we aren't going to change the default fs for a bug that only one person is seeing.

Revision history for this message
Steven Post (redalert-commander) wrote :

@mabovo: a regular fat32 filesystem only supports files up to about 4GB, wich explains your problem.

I haven't expirienced this on ext4 yet, but I did notice some corruption on ext3 a while back, nothing important, but it could have been corruptions introduced with the transfer of the file. I don't know how you downloaded it, but it might be a clue (http, ftp, bittorrent,..?), not every protocol has the same level of corruption checking.
Although I'm afraid it is in the filesystem, in my case with ext3 it was a torrent, first hash checking passed, a month later it didn't.

Revision history for this message
bert (xbert) wrote :

You can add me as a second user seeing the problem. My original report is here:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/459839

I've seen the bug with two independent installations to ext4. In my case, a fsck does seem to repair the problem, making a non-bootable system bootable again. The occurrence of disk errors is sporadic.

You asked about RAID early in the thread. I have a raid controller on my mobo, which is currently not being used. The SATA drive is plugged directly into the main connections, and is reported as /dev/sda

I wouldn't mention this at all, but for the fact that some live CD versions of linux (gnuparted LiveCD, for example) gave me fits when they recognized the RAID controller, tried to associate the drive with the RAID device, and therefore prevented me from reformatting the drive. Made me wonder if there might be some quirky interplay deep in the device stack leading to false positive RAID detections.

As reported in the original bug, I am running Kubuntu 9.10 rc 64-bit on intel quad core machine and an intel x25-m ssd. (I don't think this is one of the infamous intel SSD bugs because an alternative OS ran w/o problems)

Revision history for this message
Steve Langasek (vorlon) wrote :

> I've seen the bug with two independent installations to ext4. In my
> case, a fsck does seem to repair the problem, making a non-bootable
> system bootable again. The occurrence of disk errors is sporadic.

That doesn't sound at all like the bug Scott has described.

Revision history for this message
Scott Kitterman (kitterman) wrote :

Proposed release note:

There have been some reports of data corruption with fresh (not upgraded) ext4 file systems with large files (over 512MB). The issue is under investigation. Users who routinely manipulate large files may want to consider using ext3 file systems until this issue is resolved.

Revision history for this message
Neumarke (nospam1-neumarke) wrote :

On the issue how many people are seeing this problem, and I hope I'm not misunderstanding the relationships between bugs here:

This bug is "assigned to" linux-kernel-bugs #14354 in which Linus Torvalds himself claims to be seeing filesystem corruption, starting here:
http://bugzilla.kernel.org/show_bug.cgi?id=14354#c117

Are these bugs related or not?

Revision history for this message
Steve Langasek (vorlon) wrote :

Neumarke,

The relation to that upstream bug is tenuous at best. The upstream bug:
- is reported against a newer kernel than the one we're shipping
- is reported to only happen when ext4 is on top of the DM layer, whereas Scott's case was ext4 on a raw device
- is reported in connection with an unclean shutdown and subsequent fsck, whereas Scott reported corruption of files without an unclean shutdown (but no mention in this bug of whether the corruption requires an intervening reboot/fsck to appear - Scott, please clarify)

So that upstream bug link should be dropped; it really doesn't look like the same bug.

Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
Revision history for this message
Steve Langasek (vorlon) wrote :

Documented at <https://wiki.ubuntu.com/KarmicKoala/ReleaseNotes#Possible%20corruption%20of%20large%20files%20with%20ext4%20filesystem>:

There have been some reports of data corruption with fresh (not upgraded) ext4 file systems using the Ubuntu 9.10 kernel when writing to large files (over 512MB). The issue is under investigation, and if confirmed will be resolved in a post-release update. Users who routinely manipulate large files may want to consider using ext3 file systems until this issue is resolved. (453579)

Changed in ubuntu-release-notes:
status: New → Fix Released
Revision history for this message
Kai Blin (kai.blin) wrote :

Steve, I can confirm that in my setup.

Test is easy, as described by Scott.

I've copied over the first iso I found on my PC to my fileserver running an ext4 /data partition. Then I had some fun with md5sum:

kai@woodstock:/data/iso$ md5sum en_win_xp_pro_n.iso
138468d380b84e6b9e9a8648efb97143 en_win_xp_pro_n.iso
kai@woodstock:/data/iso$ md5sum en_win_xp_pro_n.iso
d6b2bc09fc4df1876005f20b62364f6e en_win_xp_pro_n.iso
kai@woodstock:/data/iso$ sync
kai@woodstock:/data/iso$ md5sum en_win_xp_pro_n.iso
91cf62eee1e159774c8acee03cf88f39 en_win_xp_pro_n.iso
kai@woodstock:/data/iso$ md5sum en_win_xp_pro_n.iso
d11a45c61466f2b22757e0e449e2fe90 en_win_xp_pro_n.iso
kai@woodstock:/data/iso$ md5sum en_win_xp_pro_n.iso
f8e682be3590d48e0c266d15a804e3af en_win_xp_pro_n.iso
kai@woodstock:/data/iso$ md5sum en_win_xp_pro_n.iso
d11a45c61466f2b22757e0e449e2fe90 en_win_xp_pro_n.iso
kai@woodstock:/data/iso$ md5sum en_win_xp_pro_n.iso
d11a45c61466f2b22757e0e449e2fe90 en_win_xp_pro_n.iso

Note that it seems to stabilized on d11a45c61466f2b22757e0e449e2fe90 after bouncing off from there once. Also note that calling sync seems to have no effect on the bug.

That's certainly a fun one.

Revision history for this message
Kai Blin (kai.blin) wrote :

Oh, I forgot to mention that d11a45c61466f2b22757e0e449e2fe90 is not the correct checksum the file is supposed to have.

Revision history for this message
Bob McElrath (bob+ubuntu) wrote :

I have seen problems like this with large files on multiple fs's and ultimately it was a RAM problem. Scott, can you run memtester and/or memtest86 at bootup to verify that you don't have bad RAM? Is your CPU overclocked? CPU errors can also be detected with burn* programs (cpuburn package). A rare RAM problem can cause bitflips that you wouldn't notice except in large files.

Revision history for this message
Lemmiwinks (lemmiwinks) wrote :

A few days ago, a video file in my home folder, which was over 300MB large, became unusable. Nautilus says the file has 0 bytes. When I try to open it, every player reports, that the stream does not contain any data.
Unfortunately I can not tell when exactly or what the file corruption caused.
When there are more reports like mine, I would suggest to withdraw karmic completely...

Revision history for this message
Lemmiwinks (lemmiwinks) wrote :

Forgot to mention, that I've got actually an Ext3 file system, which I updated to Ext4 soon after Jaunty was released, with no problems at all.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

Lemmiwinks [2009-10-29 18:02 -0000]:
> Forgot to mention, that I've got actually an Ext3 file system, which I
> updated to Ext4 soon after Jaunty was released, with no problems at all.

Scott, did you also upgrade your's to ext4, or was that a clean
mkfs.ext4?

Revision history for this message
Martin Jackson (mhjacks) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I have an ext4 fs that I created in jaunty as a fresh ext4 fs (during the jaunty beta cycle).

The fs is on lvm and is close to 1 TB in size...it's 92% full with mp4 files in frequent use, and I have not yet seen this issue.

I upgraded this machine to karmic just over a week ago.

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

Lemmiwinks: Sounds more like the old 0-byte bug that was in Jaunty's ext4. Scott's bug keeps the files the same size.

I'm getting this too with the .isos I downloaded today. Mine is not a ext3 --> ext4 conversion. It was formatted as ext4 by the Karmic alpha 3 or 4 installer. Unlike Scott and Ian, I am not using an SSD.

Zsync'd iso:
098824768ee3d46dcb60e8cd1fe37f61 kubuntu-9.10-desktop-amd64.iso

Torrented iso:
290ef766fdef0bda13df5c3d7ba7c163 kubuntu-9.10-desktop-amd64.ipv6.iso

Should be:
5a996e0d794e35509d0275d411a3e737 *kubuntu-9.10-desktop-amd64.iso
according to http://nl.releases.ubuntu.com/releases/kubuntu/karmic/MD5SUMS

Revision history for this message
Axos (sancroff) wrote :

I just finished a clean install of 9.10 (new default partitions...nothing retained from previous install) on a Toshiba notebook with an old-school 120 GB parallel ATA drive (whatever you call the drives that came before SATA) and 2 GB of RAM.

I ran the following commands:

openssl rand -out foo 629145600
md5sum foo
sync
md5sum foo
cp foo bar
md5sum foo bar
sync
md5sum foo bar
openssl rand -out foo2 1073741824
md5sum foo2
sync
md5sum foo2
cp foo2 bar2
md5sum foo2 bar2
md5sum foo bar foo2 bar2 > sums
# rebooted the system
md5sum foo bar foo2 bar2

All the sums were consistent. No variation. Either my system doesn't have the problem -or- there is something else which triggers it. For instance, maybe the files need to be some odd size rather than a clean multiple of 1 MB. The sizes I used above were 600 * 1024 * 1024 and 1024 * 1024 * 1024. I'll retry the test with an additional 17 bytes added to the file sizes to see if that makes any difference. I'll post again if it does.

Revision history for this message
David Warde-Farley (david-warde-farley) wrote :

@Kai Blin: Can you please confirm the kernel version this was happening with?

Revision history for this message
Kai Blin (kai.blin) wrote :

I'm seeing this on 2.6.31.4 of the beagleboard armel kernel from Launchpad. However, this might be a false alarm on my side, pointing at a hardware issue instead. I've reformatted the partition to ext3 and I'm still seeing similar effects. This is an external USB drive, which might be one part of the issue.
Sorry about the noise, this bug looked like a perfect match.

Changed in linux:
importance: Undecided → Unknown
status: New → Unknown
Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
importance: Undecided → Unknown
status: New → Unknown
Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :
Revision history for this message
J. Antonio Romero (nsdragon) wrote :

I am confused about this bug. All comments speak about freshly-created ext4 filesystems, as well as the Karmic Release Notes. But what about already-present filesystems? Right now my / is ext3 and /home is ext4 on Jaunty. If I do a dist-upgrade to Karmic, will I be affected? What about converting / later from ext3 to ext4? And what if I install Karmic from scratch, but leaving /home untouched?

Revision history for this message
Mackenzie Morgan (maco.m) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

We don't know the cause, so that's hard to answer. So far it seems that ext3
--> ext4 conversions are safe. Kind of makes sense, since the on-disk system
is a bit different. As to whether created-by-Karmic or in-use-by-Karmic is
the trouble here, we don't know yet. I think only 3 people so far have hit
this, and we all were running unstable for development reasons, so we had
created-by-Karmic filesystems. It's going to take more people reproducing it
to find out if created-by-Jaunty-and-used-in-Karmic is problematic as well.

Revision history for this message
fimbulvetr (fimbulvetr) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I tried to reproduce this on my Latitude D630 with an intel x25-m, 9.10 fresh format/install mounted raw, and was unable to.

Immediately after grabbing ubuntu-9.10-desktop-amd64.iso, the md5sum was dc51c1d7e3e173dcab4e0b9ad2be2bbf, and did not change even after a reboot.

Revision history for this message
Starcraftmazter (starcraftmazter) wrote :

fimbulvetr - did you try editing the file?

On that note, I'm doing a fresh install of 9.10 with ext4 on my laptop around the start of next week, and I'm wondering if anyone can suggest some methods to try and reproduce the bug. So far I'm thinking about obtaining a very large file, copying it around the HD, and modifying it.

I'm wondering if changing the file around the 536870912th byte would be a useful thing to do?

Revision history for this message
tellapu (tellapu) wrote :

Thanks so much for working on this critical issue. I wait to install Karmic till it is fixed, so please hurry up :-) As I often have large files (around 1 GB).

Revision history for this message
GonzO (gonzo) wrote :

I think Steve was right at post #84: the link to the linux kernel bug should be dropped, as all of the circumstances of this bug are different from the one in the link. How did this upstream link get re-established?

Changed in linux:
status: Unknown → Confirmed
Revision history for this message
Axos (sancroff) wrote :

OK, the bug is in kernel 2.6.32. Kosmic, er, Karmic Koala is 2.6.31. No wonder I wasn't able to reproduce it.

http://bugzilla.kernel.org/show_bug.cgi?id=14354

Steve Langasek (vorlon)
Changed in linux:
importance: Unknown → Undecided
status: Confirmed → New
Revision history for this message
aldebx (aldebx) wrote :

@ Kai Blin,
it should be made clear that testing with an external USB drive is not at all a reliable test. I've got through _several_ USB drives that systematically corrupted large files regardless to the HardDisk, filesystem and host computer used. This happens especially with large capacity harddisks plugged into cheap usb controllers (although that also happened to me once with an average one).

@Starcraftmazter
since the MD5 sum as all other hashes are conceived to ensure files have not been tampered with or corrupted it would _definitely_ change if you edit them! The hash file (read MD5/SHA, etc) HAVE TO change after you edit the file! Otherwise you would have been so lucky to have found a weakness in the hash algorithm.

Revision history for this message
Kai Blin (kai.blin) wrote :

@aldebx
Dunno, connecting all of my drives to all of my other boxes, I don't see any issues like that. However, I think I've already identified the system used as the real cause of my particular issue.

Revision history for this message
Bryan Quigley (bryanquigley) wrote :

Did everyone affected do a memtest like the suggestion earlier (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/453579/comments/88)

I have had the EXACT same symptoms ('cept it was a 4.7 GB ISO), on ext3 and it was a SINGLE bad line in memtest. So please run memtest Scott. Or anyone else affected by the changing md5sums. Thanks!

(Why would it work on ext3 and not ext4? maybe because ext4 reads faster and would be more likely to trip the bad memory)

Revision history for this message
Starcraftmazter (starcraftmazter) wrote :

@aldebx

Of course I realise this, perhaps I need to elaborate my idea. I mean, since the error apparently occurs when large files are edited, a test should be devised whereby changes are made to a large file, saved, and then un-done and saved - and the before and after checksums compared, to see if there in fact is a problem with writing large files.

Furthermore, since the problem allegedly happens around bits at the 512MB mark, so my idea is to write a program to take an X number of blocks before and after this bit, and swap them. X must be even to ensure every block is swapped with another. I am thinking of swapping 1000 blocks before the point with 1000 points after the point. Using fsync and running the program twice should ensure that both changes are written, and the second undoes the first - thus if two hashes of the file are taken, one before and one after the experiment, they will either be identical if no problems occured or different if there is in fact a problem.

So my question is, would this be a good test to do? I will probably have time to do it tomorrow.

Revision history for this message
unggnu (unggnu) wrote :

I also had this issue but I can't really nail it down. The explanation that it has only something to do with Kernel 2.6.32-rc* makes sense. I used it several times on my two systems. I got different md5 for the same file and if I played a video which was affected the player stops and the hard disks runs all the time and the systems hangs for a minute or more. It looks like a part was missing and the driver was searching for it.

I even shot down my whole testing system. I have a small testing partition to test Linux outside the VM. After some xorg edgers installation and restart I got an fsck problem which asks for confirmation, some inode problem. After confirmation it asks again several times so I run it with -y which delete a lot of files. Afterwards the system didn't boot anymore.
I have installed grub on the partition instead of the MBR, maybe that resulted under some special circumstances in this problem but I have done this since a long time and never had problem with it.

So actually using of 2.6.32 makes more sense since I use ext4 since quite some time. I haven't had a problem since sticking with the default Kernel but this problems doesn't just pop up. You don't realize that a huge file has changed until actually checking/using the whole thing.

Revision history for this message
Matt (twister-vertex-cc) wrote :

This might be a stupid question, but Karmic does ship with the Kernel 2.6.31-14.48 and not 14.46 right? Can anybody elaborate?

I did a fresh install with newly created ext4 partitions and have not yet encountered anything. Well I didnt really try to produce an error since this is my production machine. I have md5sumed entire folders with Gigs of data after moving them ... no errors.

Revision history for this message
Ramon (ram130-gmail) wrote :

Well I am not sure but heres a copy of what mine said after a clean install "Linux 2.6.31-14-generic-pae" ...also I have not really notice any corruption except couple hours ago when i tried to reinstall grub2 i got a "segmentation fault" error.

Revision history for this message
Starcraftmazter (starcraftmazter) wrote :

Hello.

I have wrote a C program to implement the test I described above. Currently, it checks 100x 8K blocks around the 512MB mark, by swapping them with each other, back to front. Running the program twice should thus result in an identical file. Using this program, you can check that read/write ability of your filesystem, in particular around with 512MB mark in a file (both before, at, and after).
http://codepad.org/msApnHFY
If you don't have gcc installed (it isn't by default):
sudo apt-get install build-essential
To compile:
gcc -o tester tester.c

Further, I wrote a perl script to simplify testing:
http://codepad.org/djmLaJ0l
To run:
chmod +x tester.pl
./tester.pl

If those links disappear at some stage, the programs can be found here:
http://starcraftmazter.net/launchpad/453579/

Both me and my friend have ran the test on the Ubuntu iso itself. I am using a 64bit install of 9.10 and he is using a 32bit install of 9.10. The kernel used for our tests is the default 2.6.31-14-generic. We are both on ext4.

Both of our tests came up fine, and read/write works perfectly and the before/after hashes are the same, hence we could not observe any problem.

I would encourage anyone experiencing this problem to run the above tests and see what happens, in an effort to isolate the problem.

Cheers

Revision history for this message
black (blackborn) wrote :

I have a freshly installed ubuntu 9.10 with 2 newly created ext4 partitions (45GiB for / and 870GiB for /home). I did not encounter any problem so far. (The /home drive contains ~500GiB of films) Also the tester program of comment #112 doesn't reveal any problems. So I'm lucky for now and will report if the problem emerges.

Revision history for this message
Starcraftmazter (starcraftmazter) wrote :

Furthermore, I should state the full kernel version that the final version of Ubuntu (which we did our testing on) is 2.6.31-14.48 and not 14-46. Is there a fix from 46 to 48?

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

On Tue, Nov 03, 2009 at 11:01:04AM -0000, Starcraftmazter wrote:
> Furthermore, I should state the full kernel version that the final
> version of Ubuntu (which we did our testing on) is 2.6.31-14.48 and not
> 14-46. Is there a fix from 46 to 48?

No. The bug title reflects the version of the kernel on which the error was
first seen.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Starcraftmazter (starcraftmazter) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

@Steve Langasek
I understand that, though what I'm wondering is whether there was any change from 46 to 48, which could have fixed this issue.

Revision history for this message
Mackenzie Morgan (maco.m) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I had -14.48 when I hit it.

Revision history for this message
grof (grofardel) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I have a fresh installation of Karmic Koala, and I've already have two times corrupted fs.
Ubuntu does not boot and complain about fs it cannot mount.
I have to do fsck in order to repair the things.

But the perl script above (of Starcraftmazter) said that hashes are equal.

Revision history for this message
muadnem (brownj23-deactivatedaccount) wrote :

Is this not fixed via https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781 ?

If so, will the iso images for Karmic be updated anytime soon, or will this only be available post-install?

Maybe off base, sorry if so.

Revision history for this message
muadnem (brownj23-deactivatedaccount) wrote :

I guess it would help if I pasted the right link... Please ignore the previous link.

http://bugzilla.kernel.org/show_bug.cgi?id=14354

Revision history for this message
muadnem (brownj23-deactivatedaccount) wrote :

Here is my reasoning..

"One change that we did make between 2.6.31 and 2.6.32 is that we enable journal checksums by default."

"by default" suggests that a 2.6.31 could be built with journal checksums enabled?

And maybe I'm reading wrong but it doesn't look DM specific..

Revision history for this message
Starcraftmazter (starcraftmazter) wrote :

Hmmm are the two issues (corrupted fs and corrupted large files) related?

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

On Tue, Nov 03, 2009 at 11:29:09PM -0000, Starcraftmazter wrote:
> Hmmm are the two issues (corrupted fs and corrupted large files)
> related?

No.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
muadnem (brownj23-deactivatedaccount) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

Woops. I missed the 'corruption is not detected by fsck', part. Seems like, with the elusive nature of this bug, everyone should be reporting their memtest and fsck status.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

On Wed, Nov 04, 2009 at 01:06:08AM -0000, muadnem wrote:
> Woops. I missed the 'corruption is not detected by fsck', part. Seems
> like, with the elusive nature of this bug, everyone should be reporting
> their memtest and fsck status.

In general, people who aren't actually seeing the bug described here should
not be reporting anything. All that does is make the bug log harder to
extract information from.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Leonardo Montecchi (lmontecchi) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4
Revision history for this message
pdecat (pdecat) wrote :

I too would recommend a full memtest to anyone encountering data corruption.

Revision history for this message
unggnu (unggnu) wrote :

I could confirm it again with the default Ubuntu Kernel. I was downloading a compilation of files with Bittorrent while the battery went out. There was no problem afterwards and the files seem to be downloaded fine later on but I got errors. So I started a rehash of the compilation and it found around 40 defective chunks which needed to be redownloaded. Afterwards the file check works fine so I guess the problem might have something to do with crashes/blackout in combination with ext4.
There is no problem that some recently saved data is gone after a crash but at least it should be recognized through the journal and marked as such.

Revision history for this message
Brian Rogers (brian-rogers) wrote :

The BitTorrent crash scenario doesn't indicate a bug. The only way for an application to know about uncommitted writes is to scan the file (for example by rehashing in this instance). To avoid doing this every time it's started, it saves a record of what parts have been downloaded. In a crash, this record may be more up to date than what's actually saved to the disk.

Revision history for this message
Bryan Quigley (bryanquigley) wrote :

For bittorrent (I'm assuming using transmision) check out this bug:
https://bugs.launchpad.net/ubuntu/+source/transmission/+bug/445592

Revision history for this message
Ramon (ram130-gmail) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I restarted my system the hard way and it would not boot. So i checked it
from another install with fsck and it found some errors. I'm beginning to
worry about a possible future corruption. What is the status of this
problem?

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

Ramon:
That's not what this bug is about. Improperly rebooting runs the risk
of breaking your system on *any* filesystem. This bug is about
*individual files* which are very large becoming corrupt and NOT
having any effect on fsck.

Revision history for this message
Ramon (ram130-gmail) wrote :
Download full text (3.3 KiB)

yea thats true. Have you experience any more corruption since?

On Sat, Nov 7, 2009 at 3:20 PM, Mackenzie Morgan <email address hidden> wrote:

> Ramon:
> That's not what this bug is about. Improperly rebooting runs the risk
> of breaking your system on *any* filesystem. This bug is about
> *individual files* which are very large becoming corrupt and NOT
> having any effect on fsck.
>
> --
> corruption of large files reported with linux 2.6.31-14.46 on ext4
> https://bugs.launchpad.net/bugs/453579
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: New
> Status in Ubuntu Release Notes: Fix Released
> Status in “linux” package in Ubuntu: Triaged
> Status in “linux” source package in Karmic: Triaged
>
> Bug description:
> There are worrying reports of filesystem corruption on ext4 in karmic.
> Scott says:
>
> 12:36 < Keybuk> this whole ext4 thing is worrying me
> 12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
> 12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
> 12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in
> /var/cache/apt/archives
> 12:59 < Keybuk> which seems to imply its any file large enough to use lots
> of extents
>
> I'm opening this bug report so that this bug gets tracked & triaged for
> karmic. If we're unable to isolate the issue, we should consider rolling
> back to ext3 as the default filesystem in the installer.
>
> ProblemType: Bug
> Architecture: amd64
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: vorlon 3350 F.... pulseaudio
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0xee240000 irq 17'
> Mixer name : 'Analog Devices AD1981'
> Components : 'HDA:11d41981,17aa2025,00100200'
> Controls : 20
> Simple ctrls : 11
> Date: Fri Oct 16 16:01:26 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=f108133c-6b9d-4d28-9058-0b3a0c5549b4
> MachineType: LENOVO 6371CTO
> Package: linux-image-2.6.31-14-generic 2.6.31-14.46
> PccardctlIdent:
> Socket 0:
> no product info available
> PccardctlStatus:
> Socket 0:
> no card
> ProcCmdLine: root=/dev/mapper/hostname-root ro
> ProcEnviron:
> PATH=(custom, user)
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-13.44-generic
> RelatedPackageVersions: linux-firmware 1.22
> SourcePackage: linux
> Uname: Linux 2.6.31-13-generic x86_64
> WpaSupplicantLog:
>
> dmi.bios.date: 12/27/2006
> dmi.bios.vendor: LENOVO
> dmi.bios.version: 7IET23WW (1.04 )
> dmi.board.name: 6371CTO
> dmi.board.vendor: LENOVO
> dmi.board.version: Not Available
> dmi.chassis.asset.tag: No Asset Information
> dmi.chassis.type: 10
> dmi.chassis.vendor: LENOVO
> dmi.chassis.version: Not Available
> dmi.modalias:
> dmi:bvnLENOVO:bvr7IET23WW(1.04):bd12/27/2006:svnLENOVO:pn6371CTO:pvrThinkPadT60:rvnLENOVO:rn6371CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
> dmi.product.name: 6371CTO
> dmi.product.version: Thi...

Read more...

Revision history for this message
Mackenzie Morgan (maco.m) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4
Download full text (3.6 KiB)

I haven't downloaded anymore large files since then on the basis that it'd be a
waste of bandwidth

On Saturday 07 November 2009 4:12:05 pm Ramon wrote:
> yea thats true. Have you experience any more corruption since?
>
> On Sat, Nov 7, 2009 at 3:20 PM, Mackenzie Morgan <email address hidden>
>
> wrote:
> > Ramon:
> > That's not what this bug is about. Improperly rebooting runs the risk
> > of breaking your system on *any* filesystem. This bug is about
> > *individual files* which are very large becoming corrupt and NOT
> > having any effect on fsck.
> >
> > --
> > corruption of large files reported with linux 2.6.31-14.46 on ext4
> > https://bugs.launchpad.net/bugs/453579
> > You received this bug notification because you are a direct subscriber
> > of the bug.
> >
> > Status in The Linux Kernel: New
> > Status in Ubuntu Release Notes: Fix Released
> > Status in “linux” package in Ubuntu: Triaged
> > Status in “linux” source package in Karmic: Triaged
> >
> > Bug description:
> > There are worrying reports of filesystem corruption on ext4 in karmic.
> > Scott says:
> >
> > 12:36 < Keybuk> this whole ext4 thing is worrying me
> > 12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
> > 12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
> > 12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in
> > /var/cache/apt/archives
> > 12:59 < Keybuk> which seems to imply its any file large enough to use
> > lots of extents
> >
> > I'm opening this bug report so that this bug gets tracked & triaged for
> > karmic. If we're unable to isolate the issue, we should consider rolling
> > back to ext3 as the default filesystem in the installer.
> >
> > ProblemType: Bug
> > Architecture: amd64
> > ArecordDevices:
> > **** List of CAPTURE Hardware Devices ****
> > card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
> > Subdevices: 1/1
> > Subdevice #0: subdevice #0
> > AudioDevicesInUse:
> > USER PID ACCESS COMMAND
> > /dev/snd/controlC0: vorlon 3350 F.... pulseaudio
> > Card0.Amixer.info:
> > Card hw:0 'Intel'/'HDA Intel at 0xee240000 irq 17'
> > Mixer name : 'Analog Devices AD1981'
> > Components : 'HDA:11d41981,17aa2025,00100200'
> > Controls : 20
> > Simple ctrls : 11
> > Date: Fri Oct 16 16:01:26 2009
> > DistroRelease: Ubuntu 9.10
> > HibernationDevice: RESUME=UUID=f108133c-6b9d-4d28-9058-0b3a0c5549b4
> > MachineType: LENOVO 6371CTO
> > Package: linux-image-2.6.31-14-generic 2.6.31-14.46
> > PccardctlIdent:
> > Socket 0:
> > no product info available
> > PccardctlStatus:
> > Socket 0:
> > no card
> > ProcCmdLine: root=/dev/mapper/hostname-root ro
> > ProcEnviron:
> > PATH=(custom, user)
> > LANG=en_US.UTF-8
> > SHELL=/bin/bash
> > ProcVersionSignature: Ubuntu 2.6.31-13.44-generic
> > RelatedPackageVersions: linux-firmware 1.22
> > SourcePackage: linux
> > Uname: Linux 2.6.31-13-generic x86_64
> > WpaSupplicantLog:
> >
> > dmi.bios.date: 12/27/2006
> > dmi.bios.vendor: LENOVO
> > dmi.bios.version: 7IET23WW (1.04 )
> > dmi.board.name: 6371CTO
> > dmi.board.vendor: LENOVO
> > dmi.board.version: Not Available
> > dmi.chassis.asset.tag: No Asset I...

Read more...

Revision history for this message
Pete Graner (pgraner) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

Looks like the is mostly solved upstream by reverting patch d0646f7b636d067d715fab52a2ba9c6f0f46b0d7 and adding the patch 487caeef9fc08c0565e082c40a8aaf58dad92bbb.

@apw could you or csurbhi build a test kernel and post here so folks can test?

Thanks

~pete

Changed in linux:
importance: Undecided → Unknown
status: New → Unknown
Revision history for this message
Steve Langasek (vorlon) wrote :

No, this is not upstream bug #14354. There is no overlap between the described problems.

Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
summary: - corruption of large files reported with linux 2.6.31-14.46 on ext4
+ in-place corruption of large files *without fsck or reboot* reported
+ with linux 2.6.31-14.46 on ext4
Revision history for this message
Piscium (piscium) wrote :

I am running Karmic 9.10 on a old Pentium 4 computer with PATA drives. I copied a 2.6 GByte file from an ext4 partition to another ext4 partition, then to a ext2, then to the original ext4. No problem. All files have the same md5.

Revision history for this message
unggnu (unggnu) wrote :

"Improperly rebooting runs the risk of breaking your system on *any* filesystem."
Sorry, but this is not true imho. I have never had a similar problem with ext3. Yes, there are some file systems like XFS which just deletes the data of a whole file if the computer crashes or is restarted but I guess the goal should be especially with a journaling file system to prevent errors like this. Not to mention that the size of the file is zero with XFS so you don't assume that everything is fine.
Like I said there is no problem that after a hard reboot some shortly changed data is lost but this has to be diagnosed and dealt with through the file system so that no corrupt files are saved without even realizing it.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

On Thu, 2009-10-29 at 14:57 +0000, Bob McElrath wrote:

> I have seen problems like this with large files on multiple fs's and
> ultimately it was a RAM problem. Scott, can you run memtester and/or
> memtest86 at bootup to verify that you don't have bad RAM? Is your CPU
> overclocked? CPU errors can also be detected with burn* programs
> (cpuburn package). A rare RAM problem can cause bitflips that you
> wouldn't notice except in large files.
>
Running memtest was one of the first things I did ;-) Likewise I
performed a read/write test of the drive and it was fine

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

On Thu, 2009-10-29 at 18:33 +0000, Martin Pitt wrote:

> Lemmiwinks [2009-10-29 18:02 -0000]:
> > Forgot to mention, that I've got actually an Ext3 file system, which I
> > updated to Ext4 soon after Jaunty was released, with no problems at all.
>
> Scott, did you also upgrade your's to ext4, or was that a clean
> mkfs.ext4?
>
Clean ext4

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
The Loeki (the-loeki) wrote :

I see there's a lot of discussion going on about this bug, I'll just drop in my 5c:

I cleanly installed/updated a x64 Karmic w/ext4 filesystems on my MacBook 5,1 (no other OSes installed).
Due to the nature of my work I downloaded over a dozen ISO's of different HTTP/FTPs, none of which failed their MD5 sums.

Yesterday, I cleanly installed/updated a x32 Karmic on some measly Centrino/Pentium M 1,6 Ghz laptop and copied, amongst others, 7 DVD ISO's to it from an external NTFS harddrive, with no apparent issue or corruption (though I'd have to check the MD5's to make absolutely sure).

So for me, this issue doesn't seem to exist. Then again, both filesystems are clean-and-simple 3 primary partition layouts (10GB /,RAM*1~1,5 swap, remainder /home, handmade during install), no hw/sw RAID/LVM or whatever.

Revision history for this message
Andrew M. (ender-neo) wrote :

I think I actually *have* seen this...

My setup:
Fileserver, 1.5TB array on JFS bunch of big files (videos mostly, some ISO's and the like - but none of these files *ever* change)
one client with 400GB internal SATA HDD on ext4, running Karmic with 2.6.31-15-generic AMD64 kernel
one client with 1.5TB external USB HDD on ext4, running Karmic with 2.6.31-15-generic i386 kernel

I invoke rsync from the clients as
rsync -axvc root@server:/home/some/dir /home/backup-dir

on *both* clients, i had to run rsync about 3 times until i no longer saw changes.

that is to say that there existed differences in the files even during 2nd rsync, which simply shouldn't be.

Also worth noting that another Karmic machine with ext3 and a ppc kernel doesn't see this problem. Ran rsync once, and then the second pass didn't change any files.

There doesn't seem to be any rhyme or reason to what files were corrupt or wrong on the clients after the first sync, they weren't the same between the two clients and there didn't seem to be any correlation between increased size and frequency or anything. The "giantest" files (8 GB's or so) transferred correctly the first time.

I've still got this elaborate test set up in place, and I'm *very*VERY* keen to get this worked out so that I can move to Karmic on the server (no way in hell I'm upgrading until this gets sorted out!!!)

Please let me know if any more specific tests would help or anything.

Revision history for this message
Andrew M. (ender-neo) wrote :

more info from my setup:
i have done memtest on all boxes, everything is fine
network is a wired network (not thinking it should matter, SSH would barf if packets were coming in with errors,)
the smallest file for which i saw corruption was about 120MB,
the incidence seemed to be about 6 corrupt files every {delete everything on client, reboot client, run rsync, run rsync again to see what checksums were wrong} iteration, during which minimally 400GB and about 10,000 files were transferred

Revision history for this message
Andrew M. (ender-neo) wrote :

oh yes, one more thing,
all fs's were created & formatted by the Karmic installer, using the release media

Revision history for this message
Ramon (ram130-gmail) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

Wow now that's a test!!! I think karmic corrupted my windows7 and two data
partitions. I installed karmic on a brand new 21days old 500GB hard drive.
Been transfering files for 2 weeks from my failing 320GB. After that was
done I tried booting back into windows7, failed. Karmic crashes occasionally
for no reason!! Decided to run start up repair, no problems then I ran
chkdsk, all of corrupted files on each partition! ..to top it off disk
utility is reporting my hard drive now has a bad sector!! Sumone help me
before I go insane

On Nov 11, 2009 9:26 PM, "Andrew M." <email address hidden> wrote:

oh yes, one more thing,
all fs's were created & formatted by the Karmic installer, using the release
media

-- in-place corruption of large files *without fsck or reboot* reported with
linux 2.6.31-14.46 on...

Revision history for this message
Mackenzie Morgan (maco.m) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

Er...this is only for ext4. Win7 does not run on ext4. Sounds like that bad
sector is to blame. Just because it's new doesn't mean it's not broken.

Revision history for this message
Ramon (ram130-gmail) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4
Download full text (3.4 KiB)

I used Karmic to create the partitions, 2 NTFS, SWAP & an EXT4. The bad
sector didn't show up until I was doing copying files...so far it says One
bad sector. It just seems ironic this corruption problem is here then this
happens.

On Thu, Nov 12, 2009 at 1:39 AM, Mackenzie Morgan <email address hidden> wrote:

> Er...this is only for ext4. Win7 does not run on ext4. Sounds like that
> bad
> sector is to blame. Just because it's new doesn't mean it's not broken.
>
> --
> in-place corruption of large files *without fsck or reboot* reported with
> linux 2.6.31-14.46 on ext4
> https://bugs.launchpad.net/bugs/453579
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: New
> Status in Ubuntu Release Notes: Fix Released
> Status in “linux” package in Ubuntu: Triaged
> Status in “linux” source package in Karmic: Triaged
>
> Bug description:
> There are worrying reports of filesystem corruption on ext4 in karmic.
> Scott says:
>
> 12:36 < Keybuk> this whole ext4 thing is worrying me
> 12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
> 12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
> 12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in
> /var/cache/apt/archives
> 12:59 < Keybuk> which seems to imply its any file large enough to use lots
> of extents
>
> I'm opening this bug report so that this bug gets tracked & triaged for
> karmic. If we're unable to isolate the issue, we should consider rolling
> back to ext3 as the default filesystem in the installer.
>
> ProblemType: Bug
> Architecture: amd64
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: vorlon 3350 F.... pulseaudio
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0xee240000 irq 17'
> Mixer name : 'Analog Devices AD1981'
> Components : 'HDA:11d41981,17aa2025,00100200'
> Controls : 20
> Simple ctrls : 11
> Date: Fri Oct 16 16:01:26 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=f108133c-6b9d-4d28-9058-0b3a0c5549b4
> MachineType: LENOVO 6371CTO
> Package: linux-image-2.6.31-14-generic 2.6.31-14.46
> PccardctlIdent:
> Socket 0:
> no product info available
> PccardctlStatus:
> Socket 0:
> no card
> ProcCmdLine: root=/dev/mapper/hostname-root ro
> ProcEnviron:
> PATH=(custom, user)
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-13.44-generic
> RelatedPackageVersions: linux-firmware 1.22
> SourcePackage: linux
> Uname: Linux 2.6.31-13-generic x86_64
> WpaSupplicantLog:
>
> dmi.bios.date: 12/27/2006
> dmi.bios.vendor: LENOVO
> dmi.bios.version: 7IET23WW (1.04 )
> dmi.board.name: 6371CTO
> dmi.board.vendor: LENOVO
> dmi.board.version: Not Available
> dmi.chassis.asset.tag: No Asset Information
> dmi.chassis.type: 10
> dmi.chassis.vendor: LENOVO
> dmi.chassis.version: Not Available
> dmi.modalias:
> dmi:bvnLENOVO:bvr7IET23WW(1.04):bd12/27/2006:svnLENOVO:pn6371CTO:pvrThinkPadT60:rvnL...

Read more...

Revision history for this message
unggnu (unggnu) wrote :

@Ramon
If you use Karmic you can check your whole hard disk with SMART. Check the bad sector count after an extended test. If there are some it is more likely that this was the cause then Karmic.

I guess this should be done by everyone who have problems and have already run memtest.

Revision history for this message
unggnu (unggnu) wrote :

The tool to check S.M.A.R.T in Karmic is called "Disk Utility" (gnome-disk-utility) and it is also possible before with the smartmontools.

Revision history for this message
Ramon (ram130-gmail) wrote :

That's what I had use to know I got a bad sector in the first place. I tried
runing the short test and the other one that wasn't extended. All failed
before they could complete bout 90%. Tried just awhile ago and it failed to
continue runing after 10sec, saying cannot read. I jus ran a memtest, its
clean. Windows 7 finally booted, but saying it cannot access my desktop and
a lot of errors now, sumting about read error and corrupted files. Its just
problem after problem. I guess everyone else here is better off than me
right now?

On Nov 12, 2009 3:20 AM, "unggnu" <email address hidden> wrote:

The tool to check S.M.A.R.T in Karmic is called "Disk Utility" (gnome-
disk-utility) and it is also possible before with the smartmontools.

-- in-place corruption of large files *without fsck or reboot* reported with
linux 2.6.31-14.46 on...

Revision history for this message
unggnu (unggnu) wrote :

So it looks like your hard disc is defective if even the SMART tests fail. Only the extended tests checks every sector so it should be preferred. This has nothing to do with this bug if the hardware fails.

Revision history for this message
Nicky (nickygillette) wrote :

The different checksums affects me too, I'm using the final release version of 9.10 on an Intel Celeron on a laptop.

I did RAM tests with a live copy of 8.04, for many hours with no errors, so I don't think it's bad memory.
8.04 works (with default ext3), 8.04 alt w/ full disk encryption works, also with normal checksums.

9.10 gives errors (with default ext4), w/ home directory encryption, I also get bad checksums.

9.10 live CD also gives me errors when I haven't even installed it when I use:
sudo shred -xvfz -n0 /dev/sdb # A USB flash disk I wanted to wipe

It failed at 4.7GB and started saying something about sync errors.

This same file operation works fine with the 8.04 Live CD.

I wonder if it's not just an ext4 problem, but a problem in the way 9.10 handles large files.

Revision history for this message
Nicky (nickygillette) wrote :

I meant that it happens with or without encryption on 9.10 in the comment above, if it was unclear.

Revision history for this message
Ramon (ram130-gmail) wrote :
Download full text (3.3 KiB)

i just installed karmic on a 1TB sata....and my flash drive 16GB...lets see
how it goes! both EXT4..if it lets me down, then yall got sum serious
problems..

On Wed, Nov 18, 2009 at 1:13 AM, Nicky Gillette <email address hidden>wrote:

> I meant that it happens with or without encryption on 9.10 in the
> comment above, if it was unclear.
>
> --
> in-place corruption of large files *without fsck or reboot* reported with
> linux 2.6.31-14.46 on ext4
> https://bugs.launchpad.net/bugs/453579
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: New
> Status in Ubuntu Release Notes: Fix Released
> Status in “linux” package in Ubuntu: Triaged
> Status in “linux” source package in Karmic: Triaged
>
> Bug description:
> There are worrying reports of filesystem corruption on ext4 in karmic.
> Scott says:
>
> 12:36 < Keybuk> this whole ext4 thing is worrying me
> 12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
> 12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
> 12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in
> /var/cache/apt/archives
> 12:59 < Keybuk> which seems to imply its any file large enough to use lots
> of extents
>
> I'm opening this bug report so that this bug gets tracked & triaged for
> karmic. If we're unable to isolate the issue, we should consider rolling
> back to ext3 as the default filesystem in the installer.
>
> ProblemType: Bug
> Architecture: amd64
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: vorlon 3350 F.... pulseaudio
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0xee240000 irq 17'
> Mixer name : 'Analog Devices AD1981'
> Components : 'HDA:11d41981,17aa2025,00100200'
> Controls : 20
> Simple ctrls : 11
> Date: Fri Oct 16 16:01:26 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=f108133c-6b9d-4d28-9058-0b3a0c5549b4
> MachineType: LENOVO 6371CTO
> Package: linux-image-2.6.31-14-generic 2.6.31-14.46
> PccardctlIdent:
> Socket 0:
> no product info available
> PccardctlStatus:
> Socket 0:
> no card
> ProcCmdLine: root=/dev/mapper/hostname-root ro
> ProcEnviron:
> PATH=(custom, user)
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-13.44-generic
> RelatedPackageVersions: linux-firmware 1.22
> SourcePackage: linux
> Uname: Linux 2.6.31-13-generic x86_64
> WpaSupplicantLog:
>
> dmi.bios.date: 12/27/2006
> dmi.bios.vendor: LENOVO
> dmi.bios.version: 7IET23WW (1.04 )
> dmi.board.name: 6371CTO
> dmi.board.vendor: LENOVO
> dmi.board.version: Not Available
> dmi.chassis.asset.tag: No Asset Information
> dmi.chassis.type: 10
> dmi.chassis.vendor: LENOVO
> dmi.chassis.version: Not Available
> dmi.modalias:
> dmi:bvnLENOVO:bvr7IET23WW(1.04):bd12/27/2006:svnLENOVO:pn6371CTO:pvrThinkPadT60:rvnLENOVO:rn6371CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
> dmi.product.name: 6371CTO
> dmi.product.version: ThinkPa...

Read more...

Revision history for this message
Ramon (ram130-gmail) wrote :
Download full text (3.6 KiB)

and yes i am doing large transfers of files over 16GB....how do you check
the checksums?

On Wed, Nov 18, 2009 at 2:50 AM, Ram'on McNally <email address hidden> wrote:

> i just installed karmic on a 1TB sata....and my flash drive 16GB...lets
> see how it goes! both EXT4..if it lets me down, then yall got sum serious
> problems..
>
>
> On Wed, Nov 18, 2009 at 1:13 AM, Nicky Gillette <email address hidden>wrote:
>
>> I meant that it happens with or without encryption on 9.10 in the
>> comment above, if it was unclear.
>>
>> --
>> in-place corruption of large files *without fsck or reboot* reported with
>> linux 2.6.31-14.46 on ext4
>> https://bugs.launchpad.net/bugs/453579
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>> Status in The Linux Kernel: New
>> Status in Ubuntu Release Notes: Fix Released
>> Status in “linux” package in Ubuntu: Triaged
>> Status in “linux” source package in Karmic: Triaged
>>
>> Bug description:
>> There are worrying reports of filesystem corruption on ext4 in karmic.
>> Scott says:
>>
>> 12:36 < Keybuk> this whole ext4 thing is worrying me
>> 12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
>> 12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
>> 12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in
>> /var/cache/apt/archives
>> 12:59 < Keybuk> which seems to imply its any file large enough to use lots
>> of extents
>>
>> I'm opening this bug report so that this bug gets tracked & triaged for
>> karmic. If we're unable to isolate the issue, we should consider rolling
>> back to ext3 as the default filesystem in the installer.
>>
>> ProblemType: Bug
>> Architecture: amd64
>> ArecordDevices:
>> **** List of CAPTURE Hardware Devices ****
>> card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
>> Subdevices: 1/1
>> Subdevice #0: subdevice #0
>> AudioDevicesInUse:
>> USER PID ACCESS COMMAND
>> /dev/snd/controlC0: vorlon 3350 F.... pulseaudio
>> Card0.Amixer.info:
>> Card hw:0 'Intel'/'HDA Intel at 0xee240000 irq 17'
>> Mixer name : 'Analog Devices AD1981'
>> Components : 'HDA:11d41981,17aa2025,00100200'
>> Controls : 20
>> Simple ctrls : 11
>> Date: Fri Oct 16 16:01:26 2009
>> DistroRelease: Ubuntu 9.10
>> HibernationDevice: RESUME=UUID=f108133c-6b9d-4d28-9058-0b3a0c5549b4
>> MachineType: LENOVO 6371CTO
>> Package: linux-image-2.6.31-14-generic 2.6.31-14.46
>> PccardctlIdent:
>> Socket 0:
>> no product info available
>> PccardctlStatus:
>> Socket 0:
>> no card
>> ProcCmdLine: root=/dev/mapper/hostname-root ro
>> ProcEnviron:
>> PATH=(custom, user)
>> LANG=en_US.UTF-8
>> SHELL=/bin/bash
>> ProcVersionSignature: Ubuntu 2.6.31-13.44-generic
>> RelatedPackageVersions: linux-firmware 1.22
>> SourcePackage: linux
>> Uname: Linux 2.6.31-13-generic x86_64
>> WpaSupplicantLog:
>>
>> dmi.bios.date: 12/27/2006
>> dmi.bios.vendor: LENOVO
>> dmi.bios.version: 7IET23WW (1.04 )
>> dmi.board.name: 6371CTO
>> dmi.board.vendor: LENOVO
>> dmi.board.version: Not Available
>> dmi.chassis.asset.tag: No Asset Information
>> dmi.chassis.type: 10
>> dmi.chassis.vendor: LENOVO
>> dmi...

Read more...

Revision history for this message
Starcraftmazter (starcraftmazter) wrote :

@Ramon

md5sum filename
or
sha256sum filename

Revision history for this message
Jens Janssen (jayjay) wrote :

for checking whole drive and a lot of files I use md5deep:

md5deep -lrk ./* > data.md5

sort -k 2 data.md5 > data.md5.sort

diff data.md5.sort data.md5.sort.old

Revision history for this message
Phương Võ (clarious) wrote :

I don't know if this is related or not: from a conversation I had before, ext4 divide the files into chunks with power of two size to prevent long term free space fragmentation, so a 800 MB files would be written as a 512mb chunk, then 256mb chunk and so on...

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I've not been able to reproduce this with the most recent kernel packages

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

well, and andy kernel

Revision history for this message
Donald Ray Crocker Jr. (dcrockerjr) wrote :

i wonder if some of the problems people are experiencing are due to a documentation bug.
http://www.ubuntu.com/getubuntu/releasenotes/910#Switching%20to%20ext4%20requires%20manually%20updating%20grub
makes reference to the ext4 wiki
http://ext4.wiki.kernel.org/index.php/Ext4_Howto#Converting_an_ext3_filesystem_to_ext4
the ext4 wiki under "For people who are running Ubuntu " recommends modified util-linux packages from
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ubuntu-fixed-util-linux/
or
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ubuntu-fixed-util-linux/util-linux-patch
since those referenced files date back to 07/17.2008 it seems like some of the problems could be from people installing old packages.

Revision history for this message
Mackenzie Morgan (maco.m) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

I have not installed those packages, and I really doubt Scott has either since
I think he maintains them.

Revision history for this message
Oliver Seemann (os-oebs) wrote :

I believe I might have hit this bug. I copied a 3GB iso file from NFS to a local EXT4 partition and noticed that the sha1sum is off (I only checked because the burned dvd behaved strange). I copied the file again and then it got the correct sum.

I still have both files and will keep them for a while in case they can be of help in analyzing the issue.

I just finished a full memtest86 run and it passed fine.

Some more info:

- Upgraded from jaunty, the fs was created as ext4 by jaunty
- Kernel: 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009 x86_64 GNU/Linux
- SATA hdd, no SSD
- Nothing related in dmesg
- Booted from karmic live disc and ran e2fsck /dev/sda1, no errors found.
- ('e2fsck -n /dev/sda1' on mounted fs does report errors, but I assume that is because it is mounted?)

Later I updated the kernel and I have 2.6.31-15-generic #50 running now and copied a number of 3gb isos again. Now again one of the 4 files has an incorrect hash. So this update did not fix the bug, but I did not see anything related in the change log anyway.

Scott, what kernel versions are you referring to, that you cannot reproduce this anymore?

Let me know if I can provide any further information.

Revision history for this message
cybernet (cybernet2u) wrote :

the bug was solved ?

Revision history for this message
Desh Danz (nicoluno) wrote :

I'd like to know it too.....

Revision history for this message
Ramon (ram130-gmail) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4
Download full text (3.2 KiB)

i believe not...some recent updates must have corrected the problem. You
guys experiencing any corruption lately?

On Sat, Dec 12, 2009 at 7:41 AM, Desh Danz <email address hidden> wrote:

> I'd like to know it too.....
>
> --
> in-place corruption of large files *without fsck or reboot* reported with
> linux 2.6.31-14.46 on ext4
> https://bugs.launchpad.net/bugs/453579
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: New
> Status in Ubuntu Release Notes: Fix Released
> Status in “linux” package in Ubuntu: Triaged
> Status in “linux” source package in Karmic: Triaged
>
> Bug description:
> There are worrying reports of filesystem corruption on ext4 in karmic.
> Scott says:
>
> 12:36 < Keybuk> this whole ext4 thing is worrying me
> 12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
> 12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
> 12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in
> /var/cache/apt/archives
> 12:59 < Keybuk> which seems to imply its any file large enough to use lots
> of extents
>
> I'm opening this bug report so that this bug gets tracked & triaged for
> karmic. If we're unable to isolate the issue, we should consider rolling
> back to ext3 as the default filesystem in the installer.
>
> ProblemType: Bug
> Architecture: amd64
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: vorlon 3350 F.... pulseaudio
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0xee240000 irq 17'
> Mixer name : 'Analog Devices AD1981'
> Components : 'HDA:11d41981,17aa2025,00100200'
> Controls : 20
> Simple ctrls : 11
> Date: Fri Oct 16 16:01:26 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=f108133c-6b9d-4d28-9058-0b3a0c5549b4
> MachineType: LENOVO 6371CTO
> Package: linux-image-2.6.31-14-generic 2.6.31-14.46
> PccardctlIdent:
> Socket 0:
> no product info available
> PccardctlStatus:
> Socket 0:
> no card
> ProcCmdLine: root=/dev/mapper/hostname-root ro
> ProcEnviron:
> PATH=(custom, user)
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-13.44-generic
> RelatedPackageVersions: linux-firmware 1.22
> SourcePackage: linux
> Uname: Linux 2.6.31-13-generic x86_64
> WpaSupplicantLog:
>
> dmi.bios.date: 12/27/2006
> dmi.bios.vendor: LENOVO
> dmi.bios.version: 7IET23WW (1.04 )
> dmi.board.name: 6371CTO
> dmi.board.vendor: LENOVO
> dmi.board.version: Not Available
> dmi.chassis.asset.tag: No Asset Information
> dmi.chassis.type: 10
> dmi.chassis.vendor: LENOVO
> dmi.chassis.version: Not Available
> dmi.modalias:
> dmi:bvnLENOVO:bvr7IET23WW(1.04):bd12/27/2006:svnLENOVO:pn6371CTO:pvrThinkPadT60:rvnLENOVO:rn6371CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
> dmi.product.name: 6371CTO
> dmi.product.version: ThinkPad T60
> dmi.sys.vendor: LENOVO
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/453579/+subscr...

Read more...

tags: added: 2.6.31.8
Revision history for this message
Leonardo Montecchi (lmontecchi) wrote :

I have not encountered this bug so far, with the latest Karmic updates.
I have tried to reproduce it, making about 20 copies of a file which is ~2 GB, but all the md5sums matched perfectly. I have also tried with some copies of a ~3.5 GB file, with same results.

Revision history for this message
Leonardo Montecchi (lmontecchi) wrote :

I forgot to mention that I'm using Ubuntu 64bit and that my ext4 partions were created under Jaunty

Revision history for this message
Goffi (goffi) wrote :

I experience the same problem. But when I copy files on local disk (I made a quick script to copy 100 times a file and check md5sum) everything is fine. But I have really often corrupted files (bad md5) since a while, and memcheck is OK.
I run karmic with 2.6.31-15-generic kernel, my data partition is an encrypted ext4 one.

Revision history for this message
Goffi (goffi) wrote :

I forgot: fsck is OK, and my corrupted files come from network

Revision history for this message
Ernst (ernst-blaauw) wrote :

Goffi, I think your problem is related to the network part. I have a bug
report about file corruption using samba:

File corruption after copying files via samba from Karmic to Karmic
https://bugs.launchpad.net/bugs/491288

Maybe your problem is similar?

On Wed, Dec 23, 2009 at 15:26, Goffi <email address hidden> wrote:

> I forgot: fsck is OK, and my corrupted files come from network
>
> --
> in-place corruption of large files *without fsck or reboot* reported with
> linux 2.6.31-14.46 on ext4
> https://bugs.launchpad.net/bugs/453579
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Goffi (goffi) wrote :

Ernst> yes probably, but I don't use Samba at all (I only have my gnu/linux netbook here), and as the problem described here is really simillar to mine, I was wondering if ext4 was not implicated (maybe something different happen when files come from network ?).

Revision history for this message
Michael Lazarev (milaz) wrote :

@Goffi: so, all files in which you noticed corruption, come from
network? If not Samba, how do you actually get them? Rsync? Torrent? I
believe these details could help investigate the problem.

Revision history for this message
Goffi (goffi) wrote :

I got them by downloading throught wget, firefox and chromium (no error during download).
I had issues for exemple with qtmoko (~ 90 Mb, I had to download it 3 times from 3 different server before having the good MD5), or the navit australian map of cloudmade.com (~45 Mb, downloaded 3 times from the same server, each time with a different md5 and corrupted zip checksum, I finally wrote a small python script to have a clean file from the 3 corrupted one).

Revision history for this message
Goffi (goffi) wrote :

By the way, I made a cmp of the qtmoko files, I have 2 bytes which differ:

% md5sum qtmoko-debian-v15*
5381503d377dc27b7bb669aa8f0cb43e qtmoko-debian-v15.jffs2.clean
80d61c5c70f982f6b531d2eb5d536476 qtmoko-debian-v15.jffs2.old
b35355cccc3e93ab08d1e22e05d888de qtmoko-debian-v15.jffs2.old2
% cmp -l qtmoko-debian-v15.jffs2.clean qtmoko-debian-v15.jffs2.old
16070033 64 164
16070039 257 255
% cmp -l qtmoko-debian-v15.jffs2.clean qtmoko-debian-v15.jffs2.old2
12701898 0 100
12701904 47 45

and for the navit map (the first one is the file I got with my script):

% md5sum australia.navit.bin.zip*
bb13d0594e67cbb4010d6f57de01d91f australia.navit.bin.zip
1d365eecbb2d7dfbd1e92354b338b169 australia.navit.bin.zip.old
73d37d697826735417041fcd80c2ee3d australia.navit.bin.zip.old2
f5987b008cc9b51f0dc488a6fd0d8a4f australia.navit.bin.zip.old3
% cmp -l australia.navit.bin.zip australia.navit.bin.zip.old
 4113279 316 336
 4113281 320 331
 4113282 5 7
 4113287 220 260
 4113291 206 205
 4113292 174 14
 8754187 154 54
 8754193 371 373
% cmp -l australia.navit.bin.zip australia.navit.bin.zip.old2
11878799 121 21
11878801 312 172
11878803 340 200
% cmp -l australia.navit.bin.zip australia.navit.bin.zip.old3
29183695 341 101
29183699 327 365

Revision history for this message
Michael Lazarev (milaz) wrote :

I tried to reproduce this bug with australia.navit.bin.zip, but I couldn't.

# I got the first copy with firefox, and the second with wget
> wget http://downloads.cloudmade.com/oceania/australia/australia.navit.bin.zip -O ./australia.navit.bin2.zip
> diff -bq australia.navit.bin.zip australia.navit.bin2.zip
> md5sum australia.navit.bin*
77fe45bf71779e9263d45b7f31145bbb australia.navit.bin2.zip
77fe45bf71779e9263d45b7f31145bbb australia.navit.bin.zip

By the way, if you downloaded that file after 15 December 2009, md5sum
should be like above.

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

Am 2009-12-24 01:17, schrieb Goffi:
> By the way, I made a cmp of the qtmoko files, I have 2 bytes which
> differ:

Excellent! Scott, could you also post a cmp -l of a corrupted vs a good
file?

Now, let's have some binary... (Note that cmp -l output is octal)

% cmp -l qtmoko-debian-v15.jffs2.clean qtmoko-debian-v15.jffs2.old
16070033 64 164
16070039 257 255

064 = 00110100
164 = 01110100

257 = 10101111
255 = 10101101

% cmp -l qtmoko-debian-v15.jffs2.clean qtmoko-debian-v15.jffs2.old2
12701898 0 100
12701904 47 45

000 = 00000000
100 = 01000000

047 = 00100111
045 = 00100101

You have single bit flips on the second and on the seventh bit.
This looks so much like broken memory it virtually has to be broken memory.

OTOH, if everybody sees the second and the seventh bit flip then
probably ext4 is doing something very stupid. Again, please post your
cmp -l results!

Revision history for this message
Goffi (goffi) wrote :

Well, for my case I can exonerate ext4: I made two new DL for australia map, one on my data partition (encrypted ext4), and one on my root partitition (ext3), the 2 were corrupted :(.

Michael Lazarev> I downloaded my first map before 15, and I think the version I obtained with my script is OK as I have no CRC check error when unzipping, and it works fine with navit on my freerunner.

Jakob Unterwurzacher> yeah, that sound like memory corruption, but I ran memtest (was my first reaction) for 3 hours and it told me that my memory is ok. I will try to run it again during the night. But I don't understand why I have this issue when I download, but (apparently) not on my local disk. A memory corruption would affect all my system isn't it ?

Revision history for this message
Goffi (goffi) wrote :

I ran memtest86+ for 9 hours, 10 pass, and still no error...

Revision history for this message
Goffi (goffi) wrote :

My issue seems to be related not to ext4 or my memory but to my swap. I tried to download 2 times the same 30 Mb files without swap, and this time it was the same md5.

In addition, I tried to fill my swap partition with zeros, and I have an error:

% sudo dd if=/dev/zero of=/dev/sda5 bs=1024
22293+0 records in
22293+0 records out
22828032 bytes (23 MB) copied, 16.7581 s, 1.4 MB/s
1331109+0 records in
1331109+0 records out
1363055616 bytes (1.4 GB) copied, 254.542 s, 5.4 MB/s
dd: writing `/dev/sda5': Input/output error
2931829+0 records in
2931828+0 records out
3002191872 bytes (3.0 GB) copied, 471.766 s, 6.4 MB/s
zsh: exit 1 sudo dd if=/dev/zero of=/dev/sda5 bs=1024

Is there any check done on swap partition ? Can the kernel detect errors on it ? Is there a way to avoid bad clusters with swap partitions ?

I had also an issue (scrambled screen when booting) which disappeared but I can't be sure it was solved by the swap deactivation, as I tried several things at the same time (replacing kdm by gdm, removing splash at boot, and maybe an upgrade solved the problem).

Revision history for this message
Goffi (goffi) wrote :

The tests on my partitions don't seem to find any problem:

% sudo smartctl -l selftest /dev/sda5
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1790

% sudo badblocks -sv /dev/sda5
Checking blocks 0 to 2931830
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.

Revision history for this message
Z149 (graphics149) wrote :

I am also having problems possibly ext4 related.
After clearing up some big files, the 'free space' reported in my ext4 / filesystem did not decrease.
Emptying the wastebasket and sensible checks have found no cause.
New files added and then deleted and emptied from the wastebasket used up some of my last precious free space and I've not got that space back.

'big' files includes a couple of zipped backups of 0.1 and 0.2 GB, and a monstrous 3.1GB archive.
Nothing ordinary cleared the space. fsck did not help.
could it be ext4?

============
Desktop Ubuntu 9 with linux 2.6.31-17
40GB IDE sata drive

Revision history for this message
Michael Lazarev (milaz) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

@Z149: try "Applications->Accessories->Disk Usage Alanyzer", which
also can be run from command line as "baobab". Push "Scan Filesystem"
button to see where the space goes to.

Revision history for this message
TragicWarrior (bryan-christ) wrote :

I have experienced data corruption on 2 different systems using ext4 on flash media. One of the drives was an Intel SSD drive and the other was a SanDisk Cruzer USB flash drive. I reproduced the problem several times with the both of these drives on two different hardware systems. Here's how I reproduced the problem:

1. Install Ubuntu 9.10 32-bit on USB flash drive...
   - /boot ext2 500MB (primary)
   - swap 500MB (primary)
   - / ext4 rest-of-drive (primary)

2. Install latest updates with Update Manager.

3. Reboot and observe corruption.

I have repeated a similar experiment on Fedora 12 with no file-system corruption.

Changed in ubuntu-release-notes:
status: Fix Released → Fix Committed
status: Fix Committed → Fix Released
description: updated
Revision history for this message
Øyvind Stegard (oyvindstegard) wrote :

There's a fair amount of ext4-fixes in the latest 2.6.31-18.55-kernel in karmic-proposed, according to the changelog. I suppose it would be worth testing with that kernel for the people who experience this bug.

Changed in ubuntu-release-notes:
status: Fix Released → Incomplete
Revision history for this message
Steve Langasek (vorlon) wrote :

This is not incomplete. The issue is documented in the release notes.

Changed in ubuntu-release-notes:
status: Incomplete → Fix Released
Revision history for this message
TragicWarrior (bryan-christ) wrote :

This is an unfortunate chicken-and-egg scenario. Assuming the latest kernel in karimc-proposed does fix the problem, how does one safely upgrade their system since there is a likelihood the very update itself will get corrupted? The only certain solution would be to (gasp) re-master the Karmic ISO images with a point-release so that fresh installs are guaranteed usable.

Revision history for this message
Ryan C. Underwood (nemesis-icequake) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

Why would the kernel update get corrupted unless the archive or any of
the files it contains are several hundred megabytes in size?

--
Ryan C. Underwood, <email address hidden>

Revision history for this message
TragicWarrior (bryan-christ) wrote :

Ryan,

I believe the large file aspect of the bug is an incorrect characterization. If you take a look at comment #184, you will see that I have reproduced the bug on much smaller files.

Revision history for this message
Ryan C. Underwood (nemesis-icequake) wrote :

You did not say anything about reproducing the bug on smaller files. To my knowledge this would be the first report of a file smaller than 100MB being corrupted by this bug.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Wed, Jan 27, 2010 at 07:49:06PM -0000, TragicWarrior wrote:
> I believe the large file aspect of the bug is an incorrect
> characterization. If you take a look at comment #184, you will see that
> I have reproduced the bug on much smaller files.

No, you have reproduced some *other* corruption problem that doesn't fit the
profile of the original bug report. Please file a separate bug.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
TragicWarrior (bryan-christ) wrote :

Steve,

The original posting mentions files that are 512MB (comment #53). Later it is assessed at 300MB (comment #89). Then it was whittled down to 120MB (comment #143). Then it went to 45MB (comment #174). I don't think it would be ideal to open a new bug since so much data has been captured here. Why not just re-characterize the bug to match the collected data? In my case, the largest file that I think would have come down in Update Manager would be OpenOffice ~100MB.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Wed, Jan 27, 2010 at 08:29:43PM -0000, TragicWarrior wrote:

> Why not just re-characterize the bug to match the collected data?

Because the data is not related to the bug that was reported, and it's not
appropriate to hijack bug reports for unrelated issues.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
TragicWarrior (bryan-christ) wrote :

Steve, I would hardly call changing the description

from:
"in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4"

to:
"in-place corruption of files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4"

hardly constitutes hijacking. it's not as if we are talking about night and day here. in this case, the original reporter simply didn't know the problem was manifest on smaller files < 512MB. perhaps it is easier to reproduce on larger files, but the evidence now shows that it is a problem on files 45+ MB files.

Revision history for this message
Roland (roland1979) wrote :

I can confirm this bug with current karmic kernel:
 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux

Steps to reproduce:

Download same file with 2 sources in parallel. I took Opera, and wget.

wget http://ubuntu.intergenia.de/releases/karmic/ubuntu-9.10-desktop-i386.iso
Opera saved to http://ubuntu.intergenia.de/releases/karmic/ubuntu-9.10-desktop-i386.iso-opera

Results:

roland@pdbxe100:~$ md5sum ubuntu-9.10-desktop-i386.iso
8790491bfa9d00f283ed9dd2d77b3906 ubuntu-9.10-desktop-i386.iso
roland@pdbxe100:~$ md5sum ubuntu-9.10-desktop-i386.iso-opera
3f979c279665cc7d6ead2c11b1060188 ubuntu-9.10-desktop-i386.iso-opera
roland@pdbxe100:~$ ls -l ubuntu-9.10-desktop-i386.iso*
-rw-r--r-- 1 roland roland 723488768 2009-10-28 22:14 ubuntu-9.10-desktop-i386.iso
-rw-r--r-- 1 roland roland 723488768 2010-01-28 15:35 ubuntu-9.10-desktop-i386.iso-opera
roland@pdbxe100:~$

Using cmp I found that there were NO differences?!
roland@pdbxe100:~$ cmp ubuntu-9.10-desktop-i386.iso ubuntu-9.10-desktop-i386.iso-opera

I wondered, and compared again via md5sum:
roland@pdbxe100:~$ md5sum ubuntu-9.10-desktop-i386.iso
8790491bfa9d00f283ed9dd2d77b3906 ubuntu-9.10-desktop-i386.iso
roland@pdbxe100:~$ md5sum ubuntu-9.10-desktop-i386.iso-opera
8790491bfa9d00f283ed9dd2d77b3906 ubuntu-9.10-desktop-i386.iso-opera
roland@pdbxe100:~$

So after accessing the files a second time, they seemed to have synced, flushed after delay .. or whatever.

This are my ext4 flags:
has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize

I created the filesystem manually via mkfs.ext4 /dev/sda4.

Revision history for this message
Øyvind Stegard (oyvindstegard) wrote :

I've yet to see any feedback about the 2.6.31-18 kernel
(karmic-proposed) in this critical bug report, and I find that rather
strange. The proposed -18-kernel has been out for while now and I count
80+ ext4-fixes in the changelog, including a fix for a data corruption
scenario.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/496816
http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.31.8 (upstream
stable release from which 2.6.31-18 has patches)
https://wiki.ubuntu.com/Testing/EnableProposed

Revision history for this message
Oliver Seemann (os-oebs) wrote :

I believe I must withdraw my bug report. I have a test case that reproduces the problem, but it does not seems to be related to ext4, as turned out today.

I read about 2.6.31-18 here and updated yesterday. But also with the new kernel I could reproduce the problem. Wondering about that I created an XFS partition and repeated the test on it ... also positive. So my problem is somewhere else.

The test is copying 4 big files totaling 11gb via nfs from an old Dapper box to a local partition. One of the files always ended up with a mismatching sha1 sum. The "cmp -l" output is always a single contiguous 128 byte block at some random offset. The values don't seem to be affected by single bit flips (110010 -> 101010, 11111101 -> 11010101, 10100 -> 101010, 1110 -> 11010111, 11110001 -> 11010010). Memtest86 also ran fine, at least on this box, I did not yet test the Dapper one.

Surbhi Palande (csurbhi)
Changed in linux (Ubuntu):
assignee: nobody → Surbhi Palande (csurbhi)
Revision history for this message
Surbhi Palande (csurbhi) wrote :

@scott, do you still see this bug ? I tested this by doing both an upgrade and a fresh install + updates and did not seem to run into it. The md5sum works just fine. If this is still a problem, then I will post a debug kernel if you are willing to try ?

Revision history for this message
Surbhi Palande (csurbhi) wrote :

Can anyone else confirm that this is still a bug in Karmic which is reproducible by the following steps mentioned in the original report:

1) download an iso
2) compare the md5sum

Thanks !

Revision history for this message
Surbhi Palande (csurbhi) wrote :

Also, the result of this quick test from anyone who sees this bug, would be appreciated. If you have a ext3 fs/any other fs on some partition(or a sufficiently large file which is formated as a fs other than ext4) then please do the following:

A) ensure that your blocksize if 4096 bytes by looking at the output of dumpe2fs -h <partition which has ext4>
B) from the same output see if you can find "extent" in the line which has "Filesystem features"
C) post the output dumpe2fs -h <partition which has ext4>

if blocksize is 4096 bytes then:

1) download the iso on this ext3/other filesystem
2) dd if=<iso name> of=/dev/<ext*4* partition>/<some file name> bs=512MB count=1
3) the md5sum should be: faf49ac5a653e339f84a8dd0b7c047dc

(Note that bs=512MB writes 512000000 bytes... if you write 536870912 bytes (i.e 4096 * 131072) then the md5sum should be this:
bcbc14f5bfc9229995afaf786bbb2445) Please report if the md5sum matches or not. Thanks for your help :)

Revision history for this message
Surbhi Palande (csurbhi) wrote :

Also forgot to mention that the above comments apply for the following iso image:
http://releases.ubuntu.com/karmic/ubuntu-9.10-desktop-i386.iso which originally has the md5sum as follows:
8790491bfa9d00f283ed9dd2d77b3906 (http://releases.ubuntu.com/karmic/MD5SUMS)

Revision history for this message
Jordan (jordanu) wrote :

**WARNING** Do not run the dd command in comment #200 **WARNING**

The command should read "dd if=<iso name> of=/mountpoint/for/ext4/partition/filename bs=512MB count=1"

Pointing of= to anything in /dev is wrong, and you should always be very careful when using dd. Though unlikely, trying to follow the instructions in comment #200 as currently written could lead you to accidentally overwrite the beginning of your ext4 partition with the contents of the iso, making all of the files on that partition difficult to recover, and overwriting many of them permanently.

Revision history for this message
Surbhi Palande (csurbhi) wrote :

Please use the following safer command to dd:

dd if=<iso name> of=/<mount-point-of-ext4-fs>/<some file name> bs=512MB count=1.
Do avoid using the dev partition, as pointed in #200.

Thanks Jordan :)

Revision history for this message
Surbhi Palande (csurbhi) wrote :

@TragicWarrior, can you let me know if you encounter the bug with an iso image ? Also are you still encountering the bug of a corrupted update on a (i assume safe) reboot ?

Revision history for this message
Miklos Juhasz (mjuhasz) wrote :

I have downloaded the iso and calculated the checksums with the current (2.6.31-19) and the proposed kernel (2.6.31-20) as well. Both of them matched.

$ wget http://ubuntu.intergenia.de/releases/karmic/ubuntu-9.10-desktop-i386.iso
$ md5sum ubuntu-9.10-desktop-i386.iso
8790491bfa9d00f283ed9dd2d77b3906 ubuntu-9.10-desktop-i386.iso

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I'm going to mark this bug as Invalid (I'm the original reporter)

I've not been able to replicate it on production hardware, and not been able to replicate it on the hardware where I was originally able to replicate it with karmic as it existed at release time.

Therefore I can only conclude that the problem was with faulty hardware, exasperated by a kernel issue that was fixed before karmic was released.

If you are a user still experiencing problems with the ext4 (or any other) filesystem, including those resulting in fsck errors, then you don't have the same bug that I reported so should report a new bug. Don't open this one unless you've snuck into my house and stolen my laptop <g>

Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux (Ubuntu Karmic):
status: Triaged → Invalid
Changed in linux:
status: New → Invalid
Revision history for this message
MillenniumBug (millenniumbug) wrote :

So the warning should be removed from the Release Notes...?
http://www.ubuntu.com/getubuntu/releasenotes/910

Revision history for this message
MillenniumBug (millenniumbug) wrote :

Seems to have been removed. Thankyou, someone.

Changed in ubuntu-release-notes:
status: Fix Released → Incomplete
Changed in linux (Ubuntu Karmic):
status: Invalid → Incomplete
Changed in linux (Ubuntu):
status: Invalid → Confirmed
Changed in linux (Ubuntu Karmic):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Steve Langasek (vorlon) wrote :

do not change the status of this bug.

Changed in ubuntu-release-notes:
status: Incomplete → Fix Released
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu Karmic):
status: Confirmed → Invalid
Revision history for this message
DjznBR (djzn-br) wrote :

* * * I JUST HIT THIS BUG * * *

Yes, I just did it...

I have bought a new SEAGATE HDD, part number ST3500418AS. Formatted as ext4, with / (40GB), swap (5GB), /home (220GB) and ntfs (220GB).

I installed ubuntu 10.04 and installed all updates.

Then I downloaded the ISO for 10.04.1 via Transmission bit-torrent client.
I burned the CD with Brasero.
Upon installation, it stuck in "Ubuntu" screen. Told it to check the CD and there were errors.

For my surprise, the ISO file MD5SUM was mismatching in the ext4 partition.
Then I turned on Transmission again and made it "RECHECK" the file, the file got the correct MD5SUM.

I believe I have hit this bug just now, because I was using ext3 for my home partition in my previous HDD. ext4 only for root partition. Now, problems have arised a couple of minutes JUST AFTER a ISO download and a fresh formatted ext4.

I would consider not marking this bug as invalid.

Revision history for this message
papukaija (papukaija) wrote :

@DjznBR: No, you haven't reproduced this bug. Lucid is using 2.6.32 while this bug is about kernel 2.6.31. In addition, did you read comment 206; especially this:"Therefore I can only conclude that the problem was with faulty hardware, exasperated by a kernel issue that was fixed before karmic was released." ? Please open a new bug for your issue.

Revision history for this message
DjznBR (djzn-br) wrote :

I believe the bug title has been changed once or twice, but let me re-quote here what Scott reported:

"There are worrying reports of filesystem corruption on ext4 in karmic. Scott says:

12:36 < Keybuk> this whole ext4 thing is worrying me
12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in /var/cache/apt/archives
12:59 < Keybuk> which seems to imply its any file large enough to use lots of extents"

Well, that's exactly what happened on a fresh Lucid install, using ext4 partition.

It may be neither an issue with ext4 itself, nor an issue with kernel version or patch.

I think this is related to "Transmission" application. Because reports are that the corruption takes place when torrents are downloaded. And this is what exactly happened. In some ways it may be that Transmission is not handling ext4 well. And it's very subtle, since a "file recheck" on finished torrents may just reconscrut the proper MD5SUM.

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

It's not Transmission's fault. I'm a KDE user (so, I use KTorrent),
and I was affected back when this bug was filed (no problems since
though).

Revision history for this message
Ben Lau (benlau) wrote :

The bug should also affect 10.04. I have a fresh install 10.04 AMD64 (with data copy from old harddisk , stored on /home) . The result of md5sum on ubuntu-9.10-desktop-i386.iso is changing for every time I run the command:

$ md5sum ubuntu-9.10-desktop-i386.iso
adbe2aa291535c9bfb12f207d25659b5 ubuntu-9.10-desktop-i386.iso
$ md5sum ubuntu-9.10-desktop-i386.iso
735b22e87a77e5cb1b2a885264685280 ubuntu-9.10-desktop-i386.iso
$ md5sum ubuntu-9.10-desktop-i386.iso
9fb810608e96ba3642b1d19085164f33 ubuntu-9.10-desktop-i386.iso

$ uname -a
Linux benlau-desktop 2.6.32-24-generic #41-Ubuntu SMP Thu Aug 19 01:38:40 UTC 2010 x86_64 GNU/Linux

Revision history for this message
papukaija (papukaija) wrote :

@Ben (and everyone else who thinks that he/she has reproduced this bug): No, you haven't reproduced this bug. Lucid is using 2.6.32 while this bug is about kernel 2.6.31. In addition, did you read comment 206; especially this:"Therefore I can only conclude that the problem was with faulty hardware, exasperated by a kernel issue that was fixed before karmic was released."? Please open a new bug for your issue.

Revision history for this message
era (era) wrote :

papukaija: could you please update the bug description to point to pertinent bugs for other kernel versions? I'm seeing what I suspect to be ext4 corruption on multi-CPU systems (I think all amd64) or various kernels, on both small and large files. Where and how should I report this? So far, this bug seems the closest match.

Other ext4 bugs I have looked at: bug #438379 (pretty exclusively about suspend/resume problem), bug #317781 (seems to focus on 0-byte files; certainly seems closer to what I am looking at in bug #582341).

Revision history for this message
papukaija (papukaija) wrote :

@era: Unfortunately I won't edit this bug's title nor reopen it due to reasosn mentioned in comment 215 which refers to comment 206. You should report your issue to Launchpad against the linux package. You can do so by running the following command from a Terminal (Applications->Accessories->Terminal) and it will automatically gather and attach debug information to that report:

ubuntu-bug linux

Please try to provide as much information as possible in the bug description:

    1) The majority of kernel bug are hardware specific so be sure to note what hardware/device is being used.
    2) Document any known steps to reproduce the bug.
    3) Also note whether the bug exists in previous kernel versions of Ubuntu or if it's a regression from previous kernel versions.
    4) Finally, it will help if you can test the latest development Ubuntu kernel version as well as the latest upstream mainline kernel[1].

More detailed instructions to file a bug are available at: https://help.ubuntu.com/community/ReportingBugs#How%20to%20report%20bugs

[1]: https://wiki.ubuntu.com/Kernel/MainlineBuilds

Thanks in advance.

Revision history for this message
Emanem (em4n3m) wrote :

I think I'm suffering from the same issue.
I was archiving my home in 1 tar file from 1 disk to another (all EXT4) and then it got stuck. I had to restart the computer and eventually it proceeded, but I have to say, after I copy large files, the chances I can't read/open other large files are high.

Basically as long as I don't copy/manipulate large files I don't have particular issues; as soon as I try to do such operations I have to restart my pc.

I'm using:
2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

I have to say I have a pretty vanilla Ubuntu, no customization. I'm thinking about Ext4 issue because 6 months ago all my disks were Ext2 and never had an issue, but now looks like an issue after another.
Did a memcheck and it seems definitely ok.

Unfortunately it's very hard to reproduce systematically.

Cheers

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.