Ubuntu
linux package

Bug #317781
Comment #82

Comment 82 for bug 317781

Revision history for this message

Theodore Ts'o (tytso) wrote on 2009-03-11:

#82

@Brian,

We can't hold off one rename but not other file system activities. What you can do is simply not save files to disk in your editor, until you are ready to save them all --- or, you can extend the commit time to longer than 5 seconds; laptop mode extends the commit time to be 30 seconds, if I recall correctly.

In practice, note that ext3 generally ended up spinning up the disk anyway when you saved out the file, given that (a) it would need to read in the bitmap blocks to do the non-delayed allocation, and (b) it would end up spinning up the disk 5-30 seconds later when the commit timer went off.

The current set of ext4 patches queued for 2.6.29 does force the data blocks out right away, as opposed to merely allocating the data blocks, and not actually flushing the data blocks out until the commit. The reason for this was simply lack of time on my part to create a patch that does things right, which would be a much more complicated thing to do. Quoting from the patch:

+ /*
+ * We do something simple for now. The filemap_flush() will
+ * also start triggering a write of the data blocks, which is
+ * not strictly speaking necessary (and for users of
+ * laptop_mode, not even desirable). However, to do otherwise
+ * would require replicating code paths in:
+ *
+ * ext4_da_writepages() ->
+ * write_cache_pages() ---> (via passed in callback function)
+ * __mpage_da_writepage() -->
+ * mpage_add_bh_to_extent()
+ * mpage_da_map_blocks()
+ *
+ * The problem is that write_cache_pages(), located in
+ * mm/page-writeback.c, marks pages clean in preparation for
+ * doing I/O, which is not desirable if we're not planning on
+ * doing I/O at all.
+ *
+ * We could call write_cache_pages(), and then redirty all of
+ * the pages by calling redirty_page_for_writeback() but that
+ * would be ugly in the extreme. So instead we would need to
+ * replicate parts of the code in the above functions,
+ * simplifying them becuase we wouldn't actually intend to
+ * write out the pages, but rather only collect contiguous
+ * logical block extents, call the multi-block allocator, and
+ * then update the buffer heads with the block allocations.
+ *
+ * For now, though, we'll cheat by calling filemap_flush(),
+ * which will map the blocks, and start the I/O, but not
+ * actually wait for the I/O to complete.
+ */

It's on my todo list to get this right, but given that I was getting enough complaints from users about losing dot files, I figured that it was better to get the patch in.

And again, let me stress that the window was never no more than 30-60 seconds off, and people who were paranoid could always manually use the sync command. The fact that so many people are complaining is what makes me deeply suspicious that there may be some faulty applications out there which are constantly rewriting existing applications reguarly enough that people are seeing this --- either that, or the crappy proprietary drivers are much more crash-prone than I thought, and people are used to Linux machines crashing all the time --- both of which are very bad, and very unfortunate. Hopefully neither is true, but in that case, the chances of a file getting replaced by a zero-length file are very small indeed. (And again, I will note that XFS has been doing this all along, and other newer file systems will also be doing delayed allocation, and will be subject to the same pitfalls. Maybe they will also encode the same hacks to work around broken expectations, and people with crappy proprietary binary drivers. But folks really shouldn't be counting on this....)

@Brian,

We can't hold off one rename but not other file system activities.  What you can do is simply not save files to disk in your editor, until you are ready to save them all --- or, you can extend the commit time to longer than 5 seconds; laptop mode extends the commit time to be 30 seconds, if I recall correctly.

The current set of ext4 patches queued for 2.6.29 does force the data blocks out right away, as opposed to merely allocating the data blocks, and not actually flushing the data blocks out until the commit.  The reason for this was simply lack of time on my part to create a patch that does things right, which would be a much more complicated thing to do.  Quoting from the patch:

+	/*
+	 * We do something simple for now.  The filemap_flush() will
+	 * also start triggering a write of the data blocks, which is
+	 * not strictly speaking necessary (and for users of
+	 * laptop_mode, not even desirable).  However, to do otherwise
+	 * would require replicating code paths in:
+	 * 
+	 * ext4_da_writepages() ->
+	 *    write_cache_pages() ---> (via passed in callback function)
+	 *        __mpage_da_writepage() -->
+	 *           mpage_add_bh_to_extent()
+	 *           mpage_da_map_blocks()
+	 *
+	 * The problem is that write_cache_pages(), located in
+	 * mm/page-writeback.c, marks pages clean in preparation for
+	 * doing I/O, which is not desirable if we're not planning on
+	 * doing I/O at all.
+	 *
+	 * We could call write_cache_pages(), and then redirty all of
+	 * the pages by calling redirty_page_for_writeback() but that
+	 * would be ugly in the extreme.  So instead we would need to
+	 * replicate parts of the code in the above functions,
+	 * simplifying them becuase we wouldn't actually intend to
+	 * write out the pages, but rather only collect contiguous
+	 * logical block extents, call the multi-block allocator, and
+	 * then update the buffer heads with the block allocations.
+	 * 
+	 * For now, though, we'll cheat by calling filemap_flush(),
+	 * which will map the blocks, and start the I/O, but not
+	 * actually wait for the I/O to complete.
+	 */

It's on my todo list to get this right, but given that I was getting enough complaints from users about losing dot files, I figured that it was better to get the patch in.

And again, let me stress that the window was never no more than 30-60 seconds off, and people who were paranoid could always manually use the sync command.  The fact that so many people are complaining is what makes me deeply suspicious that there may be some faulty applications out there which are constantly rewriting existing applications reguarly enough that people are seeing this --- either that, or the crappy proprietary drivers are much more crash-prone than I thought, and people are used to Linux machines crashing all the time --- both of which are very bad, and very unfortunate.  Hopefully neither is true, but in that case, the chances of a file getting replaced by a zero-length file are very small indeed.  (And again, I will note that XFS has been doing this all along, and other newer file systems will also be doing delayed allocation, and will be subject to the same pitfalls.   Maybe they will also encode the same hacks to work around broken expectations, and people with crappy proprietary binary drivers.   But folks really shouldn't be counting on this....)

Ubuntulinux package

Comment 82 for bug 317781

Ubuntu
linux package