Comment 203 for bug 317781

Revision history for this message
Chow Loong Jin (hyperair) wrote : Re: [Bug 317781] Re: Ext4 data loss

On Fri, 2009-03-27 at 22:55 +0000, Daniel Colascione wrote:
> The risk isn't data loss; if you forgo fsync, you accept the risk of
> some data loss. The issue that started this whole debate is consistency.
>
> The risk here is of the system ending up in an invalid state with zero-
> length files *THAT NEVER APPEARED ON THE RUNNING SYSTEM* suddenly
> cropping up. A zero-length file in a spot that is supposed to be
> occupied by a valid configuration file can cause problems --- an absent
> file might indicate default values, but an empty file might mean
> something completely different, like a syntax error or (famously)
> "prevent all users from logging into this system."
A syntax error usually prevents the whole program from running, I should
think. And I'm not sure about the whole "prevent all users from logging
into this sytem" bit. I've never even heard of it, so I don't know how
you can consider that famous.

> When applications *really* do is create a temporary file, write data to
> it, and rename that temporary file to its final name regardless of
> whether the original exists. If the filesystem doesn't guarantee
> consistency for a rename to a non-existing file, the application's
> expectations will be violated in unusual cases causing hard-to-discover
> bugs.
It is guaranteed. When you *rename onto an existing file*. If you delete
the original *before* renaming, then I see it as "you have agreed to
forgo your atomicity".
>
> Why should an application that atomically updates a file have to check
> whether the original exists to get data consistency?
Um, no, I don't think it needs to. See this:
Case 1: File already exists.
1. Application writes to file.tmp
2. Application closes file.tmp
3. Application renames file.tmp to file.
** If a crash happens, you either get the original, or the new.

Case 2: File doesn't already exist.
1-3 as above.
** If a crash happens, you either get the new file, or a zero-length
file.

Considering that in case 2 there wasn't a file to begin with, I don't
think it's much of an issue in getting a zero-length file. Unless your
program crashes when you get zero-length configuration files, in which
case I think your program sucks and you suck for writing it with that
assumption.

>
> Allocate blocks before *every* rename. It's a small change from the
> existing patch. The performance downsides are minimal, and making this
> change gives applications the consistency guarantees they expect.
I wholeheartedly agree with "Allocate blocks before renames over
existing files", but "Allocate blocks before *every* rename" is
overdoing it a little.
>
> Again: if you accept that you can give applications a consistency
> guarantee when using rename to update the contents of a file, it doesn't
> make sense to penalize them the first time that file is updated (i.e.,
> when it's created.) Unless, of course, you just want to punish users and
> application developers for not gratuitously calling fsync.
Again, I don't see exactly how an application is being penalized the
first time the file is updated.

--
Chow Loong Jin