Comment 139 for bug 317781

Revision history for this message
Ariel Shkedi (aslaunchpad) wrote :

Theo, I have tremendous respect for the work you did, but you are wrong.

> If you are going to be using a single monolithic config, then you really want to
> fsync() it each time you write it out. If some of the changes are
> bullsh*t ones where it really doesn't matter of you lose the last
> location and size of the window, then write that to a separate dot file
> and don't bother to fsync() it.

No. If I overwrite a file the filesystem MUST guarantee that either the old version will be there or the new one. That is one of the main selling points of a journaling file system - if the write did not complete (crash) you can go back to the old version.

There should be NO case where you end up with a zero byte file. Telling people to call fsync constantly is wrong. The filesystem should make sure not to truncate the file until it's ready to write the replacement. (Yes there are corner cases where it commits exactly in between the truncate and the write, but that is not what is happening here.) Even a crash in between the truncate and the overwrite should not loose anything, since the journal should be rolled back to the old version of the file.

Telling people to use sqllite is also not the right answer - you are essentially saying the fs is broken so use this app to fix the bugs. I might as well use sqlite on a raw partition!

> I can implement a "allocate on commit" mode, but make no mistake
> --- it ***will*** have a negative performance impact, because
> fundamentally it's the equivalent of calling fsync() on dirty files
> every five seconds.

No Theo, that is not what people are asking for. People simply want the filesytem not to commit the truncate before committing the data.

I have no idea if that is hard to do, I assume it is because you seem to be resisting the idea, but it needs to be done for ext4 to be a reliable filesystem.