ext4: panic working with large files

Bug #348836 reported by suecom
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Tim Gardner
Jaunty
Fix Released
High
Tim Gardner

Bug Description

When working on large files (> ~10GB) the file system can become fatelly corrupted. The system will crash (freeze), and unable to reboot (Grub reports 'Error 2'). Loading from a live/recovery disk and trying to fsck the corrupted filesystem yeilds multiples error.

I have trashed two system running Jaunty (Alpha 3 and Alpha 6) on Ext4 root file system. Both times I was manipulating/using large files. The first time occuired when I simply removed a 48GB file (system frooze), and the second time when VMWare was writing to a virtual disk (large file). Both system had all updates installed (2.6.27-11 kernel)

I've attached a screen shot of part of the ensuing fsck. This is after all(?) the master (global?) blocks have been decalred invalid.If you can't see from the picture, at this stage fsck is reporting multiply-claimed blocks (by the large files being used at the time, and random smaller files).

The system was a new dual processor (Core Duo X9100) Thinkpad W500 running on a 2.5" SATA drive, 4GB core, Intel GPU.

Revision history for this message
suecom (allister-nowatt) wrote :
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Eric Shattow (eshattow) wrote :

This could be related to https://bugzilla.redhat.com/show_bug.cgi?id=490026 "EXT4 panic, list corruption in ext4_mb_new_inode_pa".

I'm experiencing a fatal panic occasionally on interacting with large amounts of data. The system hardlocks and I'm usually working in X11, so I don't have access to the panic message to confirm. It does sound similar to the reported issue.

Please cherrypick http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d33a1976fbee1ee321d6f014333d8f03a39d526c to Ubuntu 2.6.28

summary: - Ext4 file system fatel corruption
+ ext4: panic working with large files
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Guys,

Just wanted to also add a note that the kernel is expected to be frozen tomorrow for Jaunty's release. I've pinged the kernel team to see if they can get this pulled in time. If not, I suspect it should qualify for a Stable Release Update for Jaunty. Thanks.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

@suecom, also I notice in your description you say you ran Jaunty Alpha3 and Alpha 6 with all updates installed. However, you mention a 2.6.27-11 kernel??? I assume that was a typo? ie. Jaunty has a 2.6.28 based kernel.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Guys,

One of our kernel devs threw together a test kernel with this patch applied and uploaded it to his PPA:

https://edge.launchpad.net/~timg-tpi/+archive/ppa

It's package "linux - 2.6.28-11.42~lp348836". It's currently still in the process of building but once it's finished if you could test and report back your results that would be great. For information on how to test from a PPA refer to https://wiki.ubuntu.com/Testing/KernelPPA specifically the Testing Developer PPA section. Thanks.

Revision history for this message
Eric Shattow (eshattow) wrote :

I will build and test, but there is no user case to reproduce. I've hit (what I think might be) this bug maybe 5 times in 2-3 months of heavy ext4 filesystem usage. There is usually file corruption afterward. My own use case is BitTorrent, and so files are checksummed and lost data is thrown out. I don't know if there is a user behavior that would more quickly reproduce the bug described by Original Poster.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Bah - the PPA is having problems so I built locally and stashed test kernels at http://kernel.ubuntu.com/~rtg/2.6.28-lp348836

Changed in linux (Ubuntu):
assignee: nobody → Tim Gardner (timg-tpi)
status: Triaged → In Progress
Revision history for this message
Eric Shattow (eshattow) wrote :

No noticeable ext4-related problems with 2.6.28-11-generic #42~lp348836 SMP. I do not know if the OP's bug is fixed, only that ~lp348836 is working okay.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

@Eric - thanks for your response. I'll add this as an SRU request for the first upload after release.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.28-11.42

---------------
linux (2.6.28-11.42) jaunty; urgency=low

  [ Tim Gardner ]

  * Enabled LPIA CONFIG_PACKET=y
    - LP: #362071

  [ Upstream Kernel Changes ]

  * ext4: fix bb_prealloc_list corruption due to wrong group locking
    - LP: #348836

 -- Stefan Bader <email address hidden> Thu, 16 Apr 2009 08:10:55 +0200

Changed in linux (Ubuntu Jaunty):
status: In Progress → Fix Released
Revision history for this message
ArbitraryConstant (anthony-spamtrap) wrote :

I am running kernel 2.6.28-11.42 generic amd64. I'm still able to crash my system with large files on ext4.

I used the following script to reproduce this:

while true; do dd if=/dev/zero of=zero bs=1M count=102400; dd if=zero of=/dev/null bs=1M; rm zero; done

The underlying device is an LVM on a VG that spans two disks.

Changed in linux (Ubuntu Jaunty):
status: Fix Released → New
Revision history for this message
ArbitraryConstant (anthony-spamtrap) wrote :

I noticed some other stuff:

[ 371.568931] EXT4-fs: barriers enabled
[ 371.569257] kjournald2 starting. Commit interval 5 seconds
[ 371.569824] EXT4 FS on dm-1, internal journal on dm-1:8
[ 371.569828] EXT4-fs: delayed allocation enabled
[ 371.569831] EXT4-fs: file extents enabled
[ 371.571151] EXT4-fs: mballoc enabled
[ 371.571157] EXT4-fs: mounted filesystem with ordered data mode.
[ 379.816940] JBD: barrier-based sync failed on dm-1:8 - disabling barriers

Barriers seem to be disabled.

$ sudo lvdisplay --maps bulk/testvol
  --- Logical volume ---
  LV Name /dev/bulk/testvol
  VG Name bulk
  LV UUID q3e1GQ-zqDS-c30Z-jneb-IbEu-GijR-zOwdL5
  LV Write Access read/write
  LV Status available
  # open 0
  LV Size 125.00 GB
  Current LE 32000
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 252:1

  --- Segments ---
  Logical extent 0 to 31999:
    Type linear
    Physical volume /dev/sda1
    Physical extents 102333 to 134332

The volume isn't spread across both disks.

The same script running on a comparable ext3 filesystem, on the same disk, on the same machine, has had no problems.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

@ArbitraryConstant, it would be better if you opened a new bug for the issue you are seeing - https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies . The reason is that the patch that was applied and uploaded here apparently didn't fix the issue you are seeing which will likely require a different patch and thus warrents a new bug report. Thanks in advance.

Changed in linux (Ubuntu Jaunty):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.