Comment 56 for bug 317781

Revision history for this message
Theodore Ts'o (tytso) wrote :

>s there no way to cache any file open call that uses O_TRUNC such that the truncation never
>happens until the file is committed? I'm sure there are considerations that are not immediately
>obvious to me.

In practice, not with 100% reliability. A program could truncate the file, and then wait two hours, and then later write data and close the file. We can't hold a transaction for hours. But in practice, what we can do is we can remember that the file descriptor was truncated, either when the file was originally opened with the O_TRUNC flag, or truncated to zero via the ftruncate() system call, and if so, when the file is closed, we can force the blocks to be allocated right away. That way when the journal commits, the files are forced out to disk right away. This causes ext4 to effectively have the same results as ext3, in the case where the application is doing the open w/ O_TRUNC, write the new file contents, and close in quick succession. And in fact that is what the patches I referred to earlier do:

http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commitdiff;h=3bf3342f394d72ed2ec7e77b5b39e1b50fad8284
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commitdiff;h=6645f8c3bc3cdaa7de4aaa3d34d40c2e8e5f09ae
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commitdiff;h=dbc85aa9f11d8c13c15527d43a3def8d7beffdc8

These patches cause ext4 to behave much the same way ext3 would in the case of series (1) and (2) that I had described above. That is, where the file is opened via O_TRUNC, and where the file is written as foo.new and then renamed from foo.new to foo, overwriting the old foo in the process.

It is not fool-proof, but then again ext3 was never fool-proof, either.

It also won't help you if you are writing a new file (instead of replacing an existing one). But the creation of new files aren't *that* common, and people tend to be most annoyed when an existing file disappears. Again, the 30 to 150 second delay before files are written out isn't really that bad. The chances that a newly written file will disappear, or appear to be "truncated to zero" (in reality, it was simply enver allocated) is actually that common --- after all, how often to machines crash? It's not like they are constantly crashing *all* the time. (This is why I really object to the categorization of needing to check "thousands of servers in a data center for zero-length file"; if "thousands of servers" are crashing unexpectedly, something is really wrong, and the sysadmins would be screaming bloody murder. Unexpected crashes is ***not*** the common case.)

The reason why people were noticing this on desktops is because crappy applications are constantly rewriting various config files, in some cases every minute or worse yet, every few seconds. This causes a massive write load, which destroys battery life, and consumes write cycles on SSD. These are badly written applications, which hopefully are not the common case. If the application is only writing its state files to dot files in the user's home directory every hour or two, what are the odds that you will get unlucky and have the crash happen in less than 30 seconds from the last update? The fact that a number of people are noticing this problem is speaking to the fact that some applications are constantly rewriting their dot files, and that's just bad design. Application writers which are doing this should be ashamed; it's a really bad idea to be constantly rewriting files like that. If you must do that, then try use a properly written registry database which is sync'ed using fdatawrite() --- or better yet, avoid trying to write the files so frequently; even if properly written so it is safe, it will burn massive amounts of battery life on a laptop.