-D introduces corruption in directories

Bug #525114 reported by Marc D.
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
e2fsprogs (Ubuntu)
Fix Released
Undecided
Unassigned
Nominated for Lucid by Martin Pool

Bug Description

Binary package hint: e2fsprogs

Package: e2fsprogs
Version: 1.41.10-1ubuntu1

Running "e2fsck -D" introduces corruption in one of my filesystems (ext4).

This Friday after deleting/moving/renamig lots of files on my generic data partition (ext4, ~450GB) and hot-removing the drive (while the filesystem was not mounted), I decided to perform a filesystem check, just to be on the safe side. So I ran "e2fsck -v -f -D" [FSCK 1]. It fixed some minor inconsistencies (I cannot remember any details, I suppose it was "incorrect i_size").

Today, for no particular reason, I ran the filesystem check again [FSCK 2] (after using the filesystem, and, after adding the directory "/kosh/.cache/kosh/classified/heap/" (see later)), and it reported errors in one directory (I have used the letters X and Y in several places instead of characters [A-Za-z-] for reasons of privacy, you may request unaltered logs if you need them): (this is only an excerpt)

e2fsck 1.41.10 (10-Feb-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
First entry ' YYY YYYYYYY YYYY YYY' (inode=13369382) in directory inode 13369351 (/kosh/.cache/kosh/classified/Porn/unsorted/pics/XXX XXXXXXXXXX XXXXXXXX XXXX XXX) should be '.'
Fix<y>? no

[note the whitespace (0x20) as the first character of YYY*]

Invalid inode number for '.' in directory inode 13369351.
Fix<y>? no

Directory entry for '.' in /kosh/.cache/kosh/classified/Porn/unsorted/pics/XXX XXXXXXXXXX XXXXXXXX XXXX XXX (13369351) is big.
Split<y>? no

Second entry '.' (inode=13369351) in directory inode 13369351 should be '..'
Fix<y>? no

Entry '..' in /kosh/.cache/kosh/classified/Porn/unsorted/pics/XXX XXXXXXXXXX XXXXXXXX XXXX XXX (13369351) is duplicate '..' entry.
Fix<y>? no

Entry '..' in /kosh/.cache/kosh/classified/Porn/unsorted/pics/XXX XXXXXXXXXX XXXXXXXX XXXX XXX (13369351) is duplicate '..' entry.
Fix<y>? no

Entry '..' in /kosh/.cache/kosh/classified/Porn/unsorted/pics/XXX XXXXXXXXXX XXXXXXXX XXXX XXX (13369351) is a link to directory /kosh/.cache/kosh/classified/Porn/unsorted/pics (7864326).
Clear<y>? no

[This is correct, .. is a link to the parent directory. There was a second pair of directory+subdirectory with the same problem, first character of the subdirectory was also 0x20]

Pass 3: Checking directory connectivity
'..' in /kosh/.cache/kosh/classified/Porn/unsorted/pics/XXX XXXXXXXXXX XXXXXXXX XXXX XXX (13369351) is <The NULL inode> (0), should be /kosh/.cache/kosh/classified/Porn/unsorted/pics (7864326).
Fix<y>? no

Unconnected directory inode 13369382 (.../ YYY YYYYYYY YYYY YYY)
Connect to /lost+found<y>? no

'..' in ... (13369382) is /kosh/.cache/kosh/classified/Porn/unsorted/pics/XXX XXXXXXXXXX XXXXXXXX XXXX XXX (13369351), should be <The NULL inode> (0).
Fix<y>? no

Pass 3A: Optimising directories
^C^CDATA: e2fsck cancelled.

DATA: ***** FILE SYSTEM WAS MODIFIED *****

DATA: ********** WARNING: Filesystem still has errors **********

Oh my, so it modified the filesystem (the directory optimisation was done).

I mounted it and looked at the files, everything seemed to be okay. I unmounted and performed the same check again [FSCK 3], with the intention of just hitting 'y', this time. But I didn’t, because apparently the optimisation had introduced a new error:

e2fsck 1.41.10 (10-Feb-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
First entry '#spam&eggs-2006.log' (inode=2884760) in directory inode 2884755 (/kosh/.cache/kosh/classified/heap/home/marc/nces/nce7/home/marc/.irssi/logs/QuakeNet) should be '.'
Fix<y>? Quit

This is quite interesting, because I have put the directory tree there between FSCK 1 and FSCK 2, and I didn’t do much between FSCK 2 and FSCK 3, so the error must have been introduced by FSCK 2.

This time, the problematic entry has "#" as first character and is not a directory. In a directory listing, both " " and "#" sort before "." and "..", so maybe that’s the problem.

Will “fixing” the first entry not being . remove that entry or put it somewhere else?

kosh@isis:~$ sudo lsb_release -rd
Description: Ubuntu lucid (development branch)
Release: 10.04
kosh@isis:~$ uname -a
Linux isis 2.6.33-020633rc8-generic #020633rc8 SMP Sat Feb 13 10:09:50 UTC 2010 x86_64 GNU/Linux

Tags: glucid
Revision history for this message
Marc D. (koshy) wrote :

“Fixing” it causes the directory entry to be lost (and found, but its name is gone).

Revision history for this message
Theodore Ts'o (tytso) wrote :

I can replicate this; it happens if you have a non-indexed directory where there is one or more file names that sort lexigraphically before ".". A file starting with a space will easily meet this criteria.

I have a patch in the e2fsprogs maint branch already, and this is something that I will be likely release a new minor release to fix.

Changed in e2fsprogs (Ubuntu):
status: New → Confirmed
Revision history for this message
Theodore Ts'o (tytso) wrote :
Revision history for this message
Martin Pool (mbp) wrote :

I think this should be considered critical for Lucid, because
1- when you hit this, it can leave the filesystem unusable
2- it can presumably happen even if you don't use -D, because the fsck manpage says this only forces an operation that may sometimes happen anyhow

Revision history for this message
Theodore Ts'o (tytso) wrote : Re: [Bug 525114] Re: -D introduces corruption in directories

On Sun, Mar 14, 2010 at 10:05:01PM -0000, Martin Pool wrote:
> I think this should be considered critical for Lucid, because
> 1- when you hit this, it can leave the filesystem unusable
> 2- it can presumably happen even if you don't use -D, because the fsck manpage says this only forces an operation that may sometimes happen anyhow

I'm working on the airplane flight from Boston to San Francisco to get
e2fsprogs 1.41.11. It's going to be a bug-fix only release only, and
it'd be nice to get a freeze exception for it....

                              - Ted

Revision history for this message
Theodore Ts'o (tytso) wrote :
Revision history for this message
Francesco Potortì (pot) wrote :

Thank you Theodore!

As far as who is affected, I see the same symptoms on ext3 (running an amd64) without ever having used -D.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

On Sun, 2010-03-14 at 22:44 +0000, Theodore Ts'o wrote:

> On Sun, Mar 14, 2010 at 10:05:01PM -0000, Martin Pool wrote:
> > I think this should be considered critical for Lucid, because
> > 1- when you hit this, it can leave the filesystem unusable
> > 2- it can presumably happen even if you don't use -D, because the fsck manpage says this only forces an operation that may sometimes happen anyhow
>
> I'm working on the airplane flight from Boston to San Francisco to get
> e2fsprogs 1.41.11. It's going to be a bug-fix only release only, and
> it'd be nice to get a freeze exception for it....
>
I'm sure we can do that; let me know when it's out and I'll chat to the
RMs

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Theodore Ts'o (tytso) wrote :

On Mon, Mar 15, 2010 at 12:58:31PM -0000, Scott James Remnant wrote:
> On Sun, 2010-03-14 at 22:44 +0000, Theodore Ts'o wrote:
>
> > On Sun, Mar 14, 2010 at 10:05:01PM -0000, Martin Pool wrote:
> > > I think this should be considered critical for Lucid, because
> > > 1- when you hit this, it can leave the filesystem unusable
> > > 2- it can presumably happen even if you don't use -D, because the fsck manpage says this only forces an operation that may sometimes happen anyhow
> >
> > I'm working on the airplane flight from Boston to San Francisco to get
> > e2fsprogs 1.41.11. It's going to be a bug-fix only release only, and
> > it'd be nice to get a freeze exception for it....
> >
> I'm sure we can do that; let me know when it's out and I'll chat to the
> RMs

It's out now, and a bunch of the bug fixes address Launchpad-reported
issues:

E2fsck will no longer give a fatal error and abort if the physical
device has been resized beyond 2**32 blocks. (Addresses Launchpad
Bug: #521648)

Debugfs has a bug fixed so that "logdump -b <blk>" now properly shows
the allocation status of the block <blk>. (Addresses Debian Bug:
#564084)

E2fsck now prints a much more emphatic and hopefully scary message
when a file system is detected as mounted while doing a read/write
check of the filesystem. Hopefully this will dissuade users from
thinking, "surely that message doesn't apply to *me*" :-(
(Addresses Launchpad Bug: #537483)

E2fsck -n will now always open the file system read-only. We now
disallow certain combination of options which previously were manual
exceptions; this is bad because it causes users to think they are
smarter than they really are. So "-n -c", "-n -l", "-n -L", and "-n
-D" are no longer supported. (Addresses Launchpad Bug: #537483)

In e2fsprogs 1.41.10, mke2fs would ask for confirmation to proceed if
it detected a badly aligned partition. Unfortunately, this broke some
distribution installation scripts, so it now just prints the warning
message and proceeds. (Addresses Red Hat Bug: #569021. Addresses
Launchpad Bug: #530071)

Mke2fs would take a long time to create very large journal files for
ext4. This was caused by a bug in ext2fs_block_iterate2(), which is
now fixed.

E2fsck now understands the EOFBLOCKS_FL flag which will be used in
2.6.34 kernels to make e2fsck not complain about blocks deliberately
fallocated() beyond an inode's i_size.

E2fsprogs 1.41.10 introduced a regression (in commit b71e018) where
e2fsck -fD can corrupt non-indexed directories when are exists one or
more file names which alphabetically sort before ".". This can happen
with ext2 filesystems or for small directories (take less than a lock)
which contain filenames that begin with a space or some other
punctuation mark. (Addresses Debian Bug: #573923, Addresses Launchpad
Bug: #525114)

      - Ted

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

On Mon, 2010-03-15 at 11:06 -0400, <email address hidden> wrote:

> It's out now, and a bunch of the bug fixes address Launchpad-reported
> issues:
>
Thanks Ted!

We're in final freeze for Beta 1 right now - will get this update in on
Friday once that's out!

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Theodore Ts'o (tytso) wrote :

On Mon, Mar 15, 2010 at 03:52:40PM +0000, Scott James Remnant wrote:
> On Mon, 2010-03-15 at 11:06 -0400, <email address hidden> wrote:
>
> > It's out now, and a bunch of the bug fixes address Launchpad-reported
> > issues:
> >
> Thanks Ted!
>
> We're in final freeze for Beta 1 right now - will get this update in on
> Friday once that's out!

OK, great.

BTW, you sent me an IM asking about O_PONIES issues? Do you still
have some questions about that?

     - Ted

Revision history for this message
Theodore Ts'o (tytso) wrote :

On Mon, Mar 15, 2010 at 09:26:44AM -0000, Francesco Potortì wrote:
> Thank you Theodore!
>
> As far as who is affected, I see the same symptoms on ext3 (running an
> amd64) without ever having used -D.

That's a different problem then. Were you able to fix it using
e2fsck, possibly from a rescue CD? Is it a repeatable problem?

If you're sure it's a software bug (as opposed to something caused by
bad/flakey hardware), I'd suggest opening another bug....

                       - Ted

Joel Ebel (jbebel)
tags: added: glucid
Revision history for this message
Theodore Ts'o (tytso) wrote :

This was fixed in e2fsprogs 1.41.11, which is in Lucid, and it doesn't exist in any other Ubuntu release, so I think we close out this bug....

Changed in e2fsprogs (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.