Comment 2 for bug 537483

Revision history for this message
Theodore Ts'o (tytso) wrote : Re: [Bug 537483] [NEW] fsck.ext4 -n wrote to & destroyed filesystem

On Thu, Mar 11, 2010 at 04:55:37PM -0000, Bela Lubkin wrote:
> The documentation for -D comments that it "will detect directory entries
> with duplicate names in a single directory, which e2fsck normally does
> not enforce". It was for this enhanced detection that I added this
> flag. I realize that it is a flag which directs fsck to write, but I
> believe that it -- as with all(*) other writing flags -- would be
> rendered inoperable by "-n". That is, I believed that the combination
> "-n -D" would cause additional checks (for directories needing
> optimization & for duplicate directory entries) without causing any
> writes. (*)I realize this isn't fully true, that the three bad-block-
> related flags -[clL] are effective even under -n. This is clearly
> documented; the clarity of _that_ documentation lends support to the
> supposition that no _other_ flags will override -n.

Yeah, sorry. The -D option was added later, and I forgot to update
the man page for the -n option. The -D option does indeed allow the
file system to be opened read/write, and will rewrite the directories,
which is danagerous when the file system is mounted.

Your assumption that -n would make the -D will modify the filesystem
part go away was a bad one.

> In any case, I do not know if it was -D, the combination of -D -E
> fragcheck, or some other random issue which caused the problem. For all
> I know, `fsck -n` is fundamentally broken on ext4. I do not wish to
> conduct further experiments after this unwitting one, which will leave
> me reconstructing a system.

There is a bug with -D and small directories in e2fsprogs 1.41.10,
which I've since fixed, which may have affected you, but
fundamentally, it is dangeorus to run e2fsck -D while the filesystem
is mounted.

> As the transcript shows, fsck responded with:
>
> /dev/sda5 is mounted.
>
> WARNING!!! Running e2fsck on a mounted filesystem may cause
> SEVERE filesystem damage.
>
> Do you really want to continue (y/n)?
>
> Perhaps foolishly, I assumed that this message is issued in all cases --
> whether or not fsck will actually be writing.

Nope, e2fsck is smart. It only issues this warning when the
filesystem is opened read-only. If you try to run "e2fsck -n
/dev/XXX" on a mounted filesystem", it won't ask that question.

So yeah, you made two bad assumptions, and that's what lead to your
file system getting badly screwed up.

I'll change things so the message is made more explicit. In case
you're curious, the reason why it was originally the worded the way it
was because if the /etc/mtab hasn't been cleared by the init scripts
when a user booted into single user mode, it was possible for e2fsck
to think the filesystem is mounted, when it really wasn't mounted.
But I'd much rather someone get scared off from running e2fsck if
their /etc/mtab hasn't been cleared after a system crash, if it avoids
the user who thinks, "surely this message doesn't apply to *me*".

> After that I ran `fdisk -l`, which failed with an I/O error (I assume
> due to the binary or shared objects not being accessible); and then
> `df`, which succeeded but showed the root filesystem (/dev/sda5) in bad
> shape.
>
> At that point I was sure the system was destroyed. Just in case, I
> switched power off without doing any software shutdown actions; but this
> did not help. Upon reboot I see:
>
> error: unknown filesystem.
> grub rescue> _

Now *that's* surprising. That makes it sound like that superblock was
destroyed, but it shouldn't have happened even when e2fsck is run
read/write on a mounted filesystem. I can't really explain that.

> POSSIBLE CAUSE: system was in-place upgraded from Ubuntu 9.10 Karmic
> Koala. Root filesystem was ext3, not ext4, before the upgrade. I don't
> believe I did anything to explicitly upgrade it to ext4. I probably
> should not have invoked fsck as `fsck.ext4` but rather just `e2fsck` or
> `fsck`, allowing the system to draw its own conclusion about filesystem
> type.

Nope, that's not it. Whether you invoke e2fsck as e2fsck, fsck.ext4,
or fsck.ext3, doesn't change anything at all about its behaviour. The
fatal mistake was -n -D, and then answering "yes" to the WARNING!!!
question.

> WARNING!!! Running e2fsck on a mounted filesystem may cause
> SEVERE filesystem damage.
>
> Do you really want to continue (y/n)? yes
>
> /dev/sda5: recovering journal

Ah.... recovering the journal while the file system is mounted might
very well have done somehow wiped out the superblock.

Anyway, I've applied the following two patches to e2fsck, which will
be in e2fsprogs 1.41.11. Thanks for the feedback, and I'm sorry you
managed to corrupt your filesystem.

I think if you run e2fsck from a rescue CD, you should hopefully be
able to recover most of your data. Some of the files might end up in
/lost+found, but hopefully you'll be able to recover your home
directory files, even if you need to reinstall the system afterwards.

Best regards,

     - Ted

From: Theodore Ts'o <email address hidden>
Date: Fri, 12 Mar 2010 19:18:20 -0500
Subject: [PATCH 1/2] e2fsck: Make the -n always open the file system read-only

A user was surprised when -n -D caused the file system to be opened
read/write, and then outsmarted himself when e2fsck asked the question:

   WARNING!!! Running e2fsck on a mounted filesystem may cause
   SEVERE filesystem damage.

   Do you really want to continue (y/n)?

This is partially our fault for not documenting the fact that -D
overrode opening the filesystem read-write. But the bottom line is it
much safer if -n *always* opens the file system read-only, so there
can be no confusion. This means that we have to disable certain
combination of options, such as "-n -c", "-n -l", and "-n -L", and
"-n -D", but the utility of these combinations is pretty low, and
is more than offset by making e2fsck idiot-proof.

Addresses-Launchpad-Bug: #537483

Signed-off-by: "Theodore Ts'o" <email address hidden>
---
 e2fsck/e2fsck.8.in | 11 +----------
 e2fsck/unix.c | 19 +++++++++++++++++--
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index a970173..3fb15e6 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -242,16 +242,7 @@ in the file are added to the bad blocks list.)
 Open the filesystem read-only, and assume an answer of `no' to all
 questions. Allows
 .B e2fsck
-to be used non-interactively. (Note: if the
-.BR \-c ,
-.BR \-l ,
-or
-.B \-L
-options are specified in addition to the
-.B \-n
-option, then the filesystem will be opened read-write, to permit the
-bad-blocks list to be updated. However, no other changes will be made
-to the filesystem.) This option
+to be used non-interactively. This option
 may not be specified at the same time as the
 .B \-p
 or
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 6248958..49e9008 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -789,8 +789,23 @@ static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
   return 0;
  if (optind != argc - 1)
   usage(ctx);
- if ((ctx->options & E2F_OPT_NO) && !bad_blocks_file &&
- !cflag && !(ctx->options & E2F_OPT_COMPRESS_DIRS))
+ if ((ctx->options & E2F_OPT_NO) &&
+ (ctx->options & E2F_OPT_COMPRESS_DIRS)) {
+ com_err(ctx->program_name, 0,
+ _("The -n and -D options are incompatible."));
+ fatal_error(ctx, 0);
+ }
+ if ((ctx->options & E2F_OPT_NO) && cflag) {
+ com_err(ctx->program_name, 0,
+ _("The -n and -c options are incompatible."));
+ fatal_error(ctx, 0);
+ }
+ if ((ctx->options & E2F_OPT_NO) && bad_blocks_file) {
+ com_err(ctx->program_name, 0,
+ _("The -n and -l/-L options are incompatible."));
+ fatal_error(ctx, 0);
+ }
+ if (ctx->options & E2F_OPT_NO)
   ctx->options |= E2F_OPT_READONLY;

  ctx->io_options = strchr(argv[optind], '?');
------------------------------

From: Theodore Ts'o <email address hidden>
Date: Fri, 12 Mar 2010 19:25:33 -0500
Subject: [PATCH 2/2] e2fsck: Make the "filesystem is mounted" message more scary

I guess the message wasn't scary enough for users who are just smart
enough to really get themselves in deep doo-doo. Let's make it even
scarier.

Addresses-Launchpad-Bug: #537483

Signed-off-by: "Theodore Ts'o" <email address hidden>
---
 e2fsck/unix.c | 4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 49e9008..124f7e6 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -230,8 +230,8 @@ static void check_mount(e2fsck_t ctx)
  if (!ctx->interactive)
   fatal_error(ctx, _("Cannot continue, aborting.\n\n"));
  printf(_("\n\n\007\007\007\007WARNING!!! "
- "Running e2fsck on a mounted filesystem may cause\n"
- "SEVERE filesystem damage.\007\007\007\n\n"));
+ "The filesystem is mounted. If you continue you ***WILL***\n"
+ "cause ***SEVERE*** filesystem damage.\007\007\007\n\n"));
  cont = ask_yn(_("Do you really want to continue"), -1);
  if (!cont) {
   printf (_("check aborted.\n"));
--
1.6.6.1.1.g974db.dirty