Comment 98 for bug 317781

Revision history for this message
Aigars Mahinovs (aigarius) wrote :

While the design of ext3 in the regard to this bug might be considered accidental, it would be wise to attempt to carry it over to ext4 in order to go 'above and beyond' POSIX in compatibility with previous behaviour. Specifically, truncation of a file needs to be made a regular write operation so that it would be cached with other write operations and flushed to disk in a regular batch.

Given the following code:

1.a) open and read file ~/.kde/foo/bar/baz
1.b) fd = open("~/.kde/foo/bar/baz", O_WRONLY|O_TRUNC|O_CREAT) --- this truncates the file
1.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
1.d) close(fd)

Assuming that less than 30 seconds pass between 1.b and 1.c, these two operations must be executed at the same write cycle without allowing a significant window of opportunity for major data loss.

2.a) open and read file ~/.kde/foo/bar/baz
2.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
2.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
2.d) close(fd)
2.e) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")

It is even clearer here - why would the rename operation change the destination file before the previous operations are completed? It should not - the rename must be an atomic operation, even if POSIX does not demand it. This is an expected behaviour for extN filesystem and ext4 needs to document and honor that.

I understand that a program can not be certain that data will reach the disk unless some sort of fsync() is called. But destroying old data and then delaying writing the new version _is_ an ext4 bug, regardless of what POSIX says.

And as a sidenote - maybe programmers feel differently, but system administrators much prefer to have a bunch of small text files that we can edit with text editors and all kinds of scripts instead of SQL database stores for application configuration. Configuration registry is a cool principle, but a horrible practice even in bast implementations.