Comment 74 for bug 317781

Revision history for this message
Volodymyr M. Lisivka (vlisivka-gmail) wrote :

> OK, so let me explain what's going on a bit more explicitly. There are application programmers who are rewriting application files like this:
>
> 1.a) open and read file ~/.kde/foo/bar/baz
> 1.b) fd = open("~/.kde/foo/bar/baz", O_WRONLY|O_TRUNC|O_CREAT) --- this truncates the file
> 1.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
> 1.d) close(fd)
>
> Slightly more sophisticated application writers will do this:
>
> 2.a) open and read file ~/.kde/foo/bar/baz
>2.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
>2.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
>2.d) close(fd)
>2.e) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")
>
>What emacs (and very sophisticated, careful application writers) will do is this:
>
>3.a) open and read file ~/.kde/foo/bar/baz
>3.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
>3.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
>3.d) fsync(fd) --- and check the error return from the fsync
>3.e) close(fd)
>3.f) rename("~/.kde/foo/bar/baz", "~/.kde/foo/bar/baz~") --- this is optional
>3.g) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")
>
>The fact that series (1) and (2) works at all is an accident. Ext3 in its default configuration happens to have the property that 5 seconds after (1) and (2) completes, the data is safely on disk. (3) is the ***only*** thing which is guaranteed not to lose data. For example, if you are using laptop mode, the 5 seconds is extended to 30 seconds.

The variant (1) is unsafe by design: data can be gone due to software failure. But variant (2) is correct. Both application developer and ext3 assuming following logic behind the scene:

2.a) open and read file ~/.kde/foo/bar/baz
2.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)

transaction_start(fd); // Hidden logic

2.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
2.d) close(fd)
2.e) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")

transaction_finish(fd); // Hidden logic

While ext4 and XFS assumes following logic:

2.a) open and read file ~/.kde/foo/bar/baz
2.b) fd1 = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
2.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
2.d) close(fd)

transaction_start(); // Hidden logic

2.e) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")

transaction_finish(); // Hidden logic

Because of that, such problem might happen in many other areas. It cannot be fixed easily just by putting call to fsync(fd), (which is not available in every programming language, BTW).

IMHO, ext4 should respect these hidden transactions. I.e., it should not reorder file and filesystem operations, which come from same process.