Bug #317781 “Ext4 data loss” : Bugs : linux package : Ubuntu

Revision history for this message

Bogdan Gribincea (bogdan-gribincea) wrote on 2009-01-16:

#1

BootDmesg.txt Edit (49.4 KiB, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (823 bytes, text/plain; charset="utf-8")
Dependencies.txt Edit (977 bytes, text/plain; charset="utf-8")
HalComputerInfo.txt Edit (3.2 KiB, text/plain; charset="utf-8")
Lspci.txt Edit (13.1 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (726 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (2.9 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (1.7 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (2.0 KiB, text/plain; charset="utf-8")

Leann Ogasawara (leannogasawara) on 2009-01-16

Changed in linux:
importance:	Undecided → High
status:	New → Triaged

Revision history for this message

Benjamin Hodgetts (enverex) wrote on 2009-01-16:

#2

I thought it was worth adding this, even though I'm running Gentoo, it seems to be exactly the same issue:

I recently upgraded to ext4 as well, I ran a game in Wine and the system hardlocked (nothing special there with the fglrx drivers). After rebooting all my Wine registry files were 0 bytes, as were many of my Gnome configuration files. Absoloute nightmare. fsck on boot said that it had removed 760+ orphaned inodes.

Mounted as:
/dev/root on / type ext4 (rw,noatime,barrier=1,data=ordered)

Revision history for this message

Benjamin Hodgetts (enverex) wrote on 2009-01-17:

#3

Additional: ext4 was implemented as a clean format, not an upgrade of any sort (backed up, formatted and copied back over).

Revision history for this message

Matt Drake (mattduckman) wrote on 2009-01-17:

#4

This has happened to me twice, the first time erasing Firefox settings, and the second time erasing gnome-terminal settings. Both cases were caused by a kernel panic locking up the system. Also, both times the program whose settings were affected was in use during the kernel panic.

An important note is that these data losses have been taken place on an ext3 partition that is mounted as ext4 in fstab, so it is not a true ext4 partition.

This is taking place on fully up-to-date Jaunty.

Revision history for this message

Pavel Rojtberg (rojtberg) wrote on 2009-01-17:

#5

I also had data loss with ext4. The "feature" responsible for this is delayed allocation.
With delayed allocation on all hd-writes are held back in memory, so if you just cut the power the data is lost.

Basically the old version should be still available, but perhaps ext4 decides that a zeroed file is more "consistent".

Revision history for this message

Anders Aagaard (aagaande) wrote on 2009-01-17:

#6

Delayed allocation is to skip the allocating step when writing a file, not to keep data in memory. I'd say this is more likely to be related to barriers, but that's only because of my hate towards how ext handles barriers in a non safe way.

Revision history for this message

Theodore Ts'o (tytso) wrote on 2009-01-17:

#7

Ben --- can you tell me what version of the kernel you are using? Since you are a Gentoo user, it's not obvious to me what version of the kernel you are using, and whether you have any ext4-related patches installed or not.

Bogden --- *any* files written during the previous boot cycle?

I've done some testing, using Ubuntu Interpid, and a stock (unmodified) 2.6.28 kernel on a Lenovo S10 netbook (my crash and burn machine; great for doing testing :-). On it, I created a fresh ext4 filesystem on an LVM partition, and I used as a test source a directory /home/worf, a test account that has been used briefly right after I installed it, so it has gnome dot files, plus a relatively small number of files in the Firefox cache. Its total size is 21 megabytes.

I then created a ext4 filesystem, and then tested it as follows:

% sudo bash
# cp -r /home/worf /mnt ; sleep 120; echo b > /proc/sysrq-trigger

After the system was forcely rebooted (the echo b >/proc/sysrq-trigger emulates a crash), I checked the contents of /mnt/worf using cp -r and cfv, and below changed the sleep time. What I found was that at sleep times above 65 seconds, all of /mnt/worf was safely written to disk. Below 30 seconds, none of /mnt/worf was written to disk. If the sleep 120 was replaced with a sync, everything was written to disk.

How aggressively the system writes things back out to disk can be controlled via some tuning parameters, in particular /proc/sys/vm/dirty_expire_centisecs and /proc/sys/vm/dirty_writeback_centisecs. The latter, in particular will be adjusted by laptop_mode and other tools that are trying to extend battery lifespans.

So the bottom line is that I'm not able to replicate any data loss except for very recently written data before a crash, and this can be controlled by explicitly using the "sync" command or adjusting how aggressively the system writes back dirty pages via /proc/sys/vm/dirty_expire_centisecs and /proc/sys/vm/dirty_writeback_centisecs.

It would be useful if you could send me the output of "sysctl -a", and if you can tell me whether the amount of data that you are losing is decreased if you explicitly issue the "sync" command before the crash (which you can simulate via "echo b > /proc/sysctl-trigger".)

Ben --- can you tell me what version of the kernel you are using?   Since you are a Gentoo user, it's not obvious to me what version of the kernel you are using, and whether you have any ext4-related patches installed or not.

Bogden --- *any* files written during the previous boot cycle?

I've done some testing, using Ubuntu Interpid, and a stock (unmodified) 2.6.28 kernel on a Lenovo S10 netbook (my crash and burn machine; great for doing testing :-).   On it, I created a fresh ext4 filesystem on an LVM partition, and I used as a test source a directory /home/worf, a test account that has been used briefly right after I installed it, so it has gnome dot files, plus a relatively small number of files in the Firefox cache.   Its total size is 21 megabytes.

I then created a ext4 filesystem, and then tested it as follows:

% sudo bash
# cp -r /home/worf /mnt ; sleep 120; echo b > /proc/sysrq-trigger

After the system was forcely rebooted (the echo b >/proc/sysrq-trigger emulates a crash), I checked the contents of /mnt/worf using cp -r and cfv, and below changed the sleep time.   What I found was that at sleep times above 65 seconds, all of /mnt/worf was safely written  to disk.   Below 30 seconds, none of /mnt/worf was written to disk.   If the sleep 120 was replaced with a sync, everything was written to disk.

How aggressively the system writes things back out to disk can be controlled via some tuning parameters, in particular /proc/sys/vm/dirty_expire_centisecs and /proc/sys/vm/dirty_writeback_centisecs.   The latter, in particular will be adjusted by laptop_mode and other tools that are trying to extend battery lifespans.

So the bottom line is that I'm not able to replicate any data loss except for very recently written data before a crash, and this can be controlled by explicitly using the "sync" command or adjusting how aggressively the system writes back dirty pages via /proc/sys/vm/dirty_expire_centisecs  and /proc/sys/vm/dirty_writeback_centisecs.

It would be useful if you could send me the output of "sysctl -a", and if you can tell me whether the amount of data that you are losing is decreased if you explicitly issue the "sync" command before the crash (which you can simulate via "echo b > /proc/sysctl-trigger".)

Revision history for this message

Benjamin Hodgetts (enverex) wrote on 2009-01-17:

#8

Kernel is Gentoo's own:

Linux defiant 2.6.28-gentoo #4 SMP Sat Jan 3 21:56:33 GMT 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel GNU/Linux

The files that were zeroed when my machine hardlocked I'd imagine were the ones that were in use; my desktop env is Gnome and I was running a game in Wine. Wine's reg files which it would have had open were wiped and also my Gnome terminal settings were wiped. Not sure how often it would have been writing to them but it would have only been tiny amounts of data if it was.

Revision history for this message

Pavel Rojtberg (rojtberg) wrote on 2009-01-17:

#9

in my case the zeroed files were also just updated, not created.

My "test scenario" was starting a OpenGL applications with unsaved source files. Then all graphics froze because of a bug in fglrx.

If I immediately powered off the machine my source file was empty. If I waited a bit everything was saved fine. Likely because I waited long enough for the changes to be written to disk...

Revision history for this message

Bogdan Gribincea (bogdan-gribincea) wrote on 2009-01-17:

#10

Just to clarify, my Ext4 partitions were all 'true' ext4 not converted from ext3.

Revision history for this message

Bogdan Gribincea (bogdan-gribincea) wrote on 2009-01-22:

#11

It happend again. Somehow when trying to logout KDM crashed. After rebooting I had some zeroed config files in a few KDE apps, log files (pidgin)..
I coverted / and /home back to EXT3. This is extremely annoying, reminds me of Windows 9x

I will have some free time next week and I'll try testing this in a virtual machine.

Revision history for this message

Andy Whitcroft (apw) wrote on 2009-01-22:

#12

@Bogdan Gribincea -- am i correct in thinking that you are using the ext4 support in the Intrepid kernel?

Revision history for this message

Bogdan Gribincea (bogdan-gribincea) wrote on 2009-01-23:

#13

@Andy: I am the bug report starter and I attached all the logs generated by the ubuntu-bug command.
And, no, it's Jaunty with it's 2.6.28 kernel and the 'stable' ext4 support. Also the partitions were created as ext4 on a new drive, not converted from ext3.

Revision history for this message

Kai Mast (kai-mast) wrote on 2009-01-27:

#14

I can confirm this with AMD64 and Ubuntu Jaunty

Revision history for this message

Niclas Lockner (niclasl) wrote on 2009-01-29:

#15

I have experienced some issues with EXT4 and data losses too, but more extreme than you all describe. I installed jaunty alpha 3 two days ago and have all the updates installed. Since the install 2 days ago I have lost data on 3 occations. The most strange losses is:
* the computer wiped out a whole network share mounted in fstab
* the computer one time also removed ~/.icons when I empty the trash

The data losses never happened after a crasch or power failure.

Revision history for this message

Veovis (masterkedri) wrote on 2009-01-31:

#16

I was just browsing the forums on Ubuntu ( http://ubuntuforums.org/showthread.php?t=1055176 ), where I read of a bug involving symbolic links. The bug is that if you delete a folder that is sylbolic link it will delete the contents of the folder as well as the symbolic link as if it were not a symbolic link.

Does this sound like it could have been the situation?

Revision history for this message

Benjamin Hodgetts (enverex) wrote on 2009-01-31:

#17

No Veovis, please read the bug, that is nothing to do with the actual report here.

Revision history for this message

Christian Roessner (christian-roessner-net) wrote on 2009-01-31:

#18

Just a question: Would data=journal in /etc/fstab be a workaround until this bug is fixed?

(Unfortunatlly I can not set this option in fstab for the root-partition, because initramfs does not support that feature!), but you may try this with your home and what-else partitions, if you do have them.

Data=journal deactivates delaloc. It should put data and metadata into the journal, thereof I hope that recently opened files should not have 0 bytes size. I do not put further comments here from my side. It´s just an idea.

Revision history for this message

Andy Whitcroft (apw) wrote on 2009-02-09:

#19

Talking to Ted on this one we believe that the trigger for the data loss has been found and included in a new upstream stable update. The patches for ext4 have been picked up and applied to the jaunty kernel. Kernel 2.6.28-7.19 should contain those. If any of you are able to test and report back that would be most helpful.

Changed in linux:
assignee:	nobody → apw
status:	Triaged → Incomplete

Revision history for this message

Steve Langasek (vorlon) wrote on 2009-02-13:

#20

Should we mark this bug as 'fix released' unless someone shows otherwise?

Revision history for this message

Christian Roessner (christian-roessner-net) wrote on 2009-02-13:

#21

Hi,

just updated my system. While this was in process, I tried to switch compiz to metacity (checking for another bug). The X-server froze and I switched to tty2 to stop gdm. This took a long time and afterwards even the Xorg process hang. I entered reboot, but the system could not reboot in the last third of the stop procedure. I used SysRequest+s, -u, -b.

After reboot I saw: /home was unmounted unclean, check forced.

When I came into gnome again, my compiz settings were partially cleared and my gnome-terminal settings were lost. I can not say, if the files were zero bytes. But maybe the e2fsck had corrupted some files.

The updates that had taken place did not include compiz or gnome-terminal so at this point I can not see a connection to the updates done and the lost information.

Revision history for this message

Christian Roessner (christian-roessner-net) wrote on 2009-02-13:

#22

Ok, thunderbird settings are gone, too. So this seems ext4 related?

Revision history for this message

pablomme (pablomme) wrote on 2009-02-13:

#23

@Christian: I understand that your system hung _before_ you rebooted into the updated kernel? If so, the changes wouldn't have taken effect, and the data loss was caused by the original kernel.

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2009-02-13:

#24

This issue should be fixed with 2.6.28-7.18. I cherry picked a number of patches that Ted Tso is submitting for stable kernel updates which he says fixes this data loss problem. Please confirm.

Changed in linux:
assignee:	apw → timg-tpi
status:	Incomplete → Fix Released

Revision history for this message

Christian Roessner (christian-roessner-net) wrote on 2009-02-13: Re: [Bug 317781] Re: Ext4 data loss

#25

pablomme schrieb:
> @Christian: I understand that your system hung _before_ you rebooted
> into the updated kernel? If so, the changes wouldn't have taken effect,
> and the data loss was caused by the original kernel.

Well, there has not been any kernel update so far here. I am on:

2.6.28-7.20 for several days now. The problem I spoke about was today
with the specified kernel.

So are the patches applied to this version or a leter one, which has not
arrived here, yet?

Else the problem persists.

Revision history for this message

Peter Clifton (pcjc2) wrote on 2009-02-21:

#26

I'm using 2.6.28-8-generic and a crash just zeroed out a _load_ of important files in my git repository which I'd recently rebased a patch series in.

Not impressed (TM).

Oh well.. anyway.. I don't think this problem is fixed.

Revision history for this message

André Barmasse (barmassus) wrote on 2009-02-23:

#27

For testing I installed ext4 together with Jaunty Alpha 4 as standard root file system on my Sony Vaio. Since then I had four hardlocks, two of them completely destroying my gnome desktop. So far, this only happens within Gnome while upgrading the system with apt-get in a shell AND at the same time running and working with other programms (like quodlibet, firefox, Thunderbird, Bluefish etc.).

As of the gnome desktop destructions, in one case apt-get unfortunately was just installing some xorg-server files and - in the other case - configuring the gnome-desktop, when the hard lock happened. The sad part is that I didn't find a way to repair the broken system neither with apt-get nor with dpkg nor with aptitute as the size some needed configuration files was set to zero by the crash. So, for now I am switching back to ext4 releasing this warning:

DON'T DO ANYTHING WHILE UPGRADING UBUNTU WITH EXT4!

Revision history for this message

David Tomaschik (matir) wrote on 2009-02-23:

#28

Looks like the data loss bug may still exist. Setting back to confirmed.

Changed in linux:
status:	Fix Released → Confirmed

Revision history for this message

pablomme (pablomme) wrote on 2009-02-23:

#29

write_stuff.gz Edit (768 bytes, application/octet-stream)

I think this bug is in desperate need of a systematic test, so I've attached a script which attempts to do precisely that. You need to run the script like this:

./write_stuff <directory-under-ext4-volume>

The script will open 5 files (named 'file_<i>') under the specified directory and start appending one line per second to each of them, until you stop it with Ctrl-C. If the script is re-run, lines are appended to the previous contents.

If instead of stopping the script you turn off or reboot your computer by force (say with SysRq+B, or holding the power button), you would be reproducing the conditions under which the bug seems to occur.

My / partition is ext4 (but not my /home, so I haven't suffered this bug as much as others have). Running the script on '/test' without any initial files and rebooting with SysRq+B gave:

- rebooting in 30 seconds resulted in all 5 files zeroed
- rebooting in 45 seconds resulted in 4 files having 40 lines and one having 41
- rebooting in 60 seconds resulted in 4 files having 55 lines and one having 56

I would think that the first data flush on the initially-empty files takes too long to occur. This would explain the problems other people are having if the configuration files they mention are deleted and rewritten from scratch, and the system crashes before the first flush. Or maybe I'm completely wrong in my interpretation, so go ahead and do your own tests.

Hope this helps!

Revision history for this message

Jeremy LaCroix (jlacroix82-deactivatedaccount) wrote on 2009-02-24:

#30

I never had any trouble, but I installed Jaunty last week, after the fix
was released. Is it possible this bug now only affects those that
installed it prior to the fix? When did you guys install it?

I've moved over 100GB of files since I installed it. I had at least two
hard crashes, everything seems to be intact.

André Barmasse wrote:
> For testing I installed ext4 together with Jaunty Alpha 4 as standard
> root file system on my Sony Vaio. Since then I had four hardlocks, two
> of them completely destroying my gnome desktop. So far, this only
> happens within Gnome while upgrading the system with apt-get in a shell
> AND at the same time running and working with other programms (like
> quodlibet, firefox, Thunderbird, Bluefish etc.).
>
> As of the gnome desktop destructions, in one case apt-get unfortunately
> was just installing some xorg-server files and - in the other case -
> configuring the gnome-desktop, when the hard lock happened. The sad part
> is that I didn't find a way to repair the broken system neither with
> apt-get nor with dpkg nor with aptitute as the size some needed
> configuration files was set to zero by the crash. So, for now I am
> switching back to ext4 releasing this warning:
>
> DON'T DO ANYTHING WHILE UPGRADING UBUNTU WITH EXT4!
>

Revision history for this message

pablomme (pablomme) wrote on 2009-02-24:

#31

@Jeremy: I've never had a problem either, but I haven't had any crashes at all so this bug hasn't had a chance to show up. However my script above does reproduce the problem - have you tried it?

I installed Jaunty Alpha 4 on February the 6th. I would suppose that this is equivalent to what you've done, since you only get the updates after having installed the system. (Alpha 5 is not out yet, is it?)

Revision history for this message

Jeremy LaCroix (jlacroix82-deactivatedaccount) wrote on 2009-02-25:

#32

I've not experienced the bug yet, though I've not had a chance to try
the script yet.

I was wondering if anyone knows of a terminal command I could run that
would give me a list of all the files on my system that are 0 KB. As far
as I know, I've never experienced this, but then again it may have
happened to a file I don't use often. I don't care if I lose anything
since I have everything backed up but a command to list files that may
have been effected would be nice if anyone knows.

Revision history for this message

pablomme (pablomme) wrote on 2009-02-26:

#33

> I was wondering if anyone knows of a terminal command I could run that
> would give me a list of all the files on my system that are 0 KB.

There's

find / -type f -size 0

but there are very many files that have zero length under normal conditions, so it'll be very hard to tell if any file has been affected this way.

Revision history for this message

Wade Menard (wade-ezri) wrote on 2009-02-26:

#34

find / -size 0b should be enough. Please keep further discussion not related to fixing this bug on a forum or mailing list.

Revision history for this message

Jeremy LaCroix (jlacroix82-deactivatedaccount) wrote on 2009-02-26:

#35

Thank you. Please understand that my question was related to this bug,
as such a command will help me determine if this bug is affecting me,
then I could give more info that would help the fix.

Wade Menard wrote:
> find / -size 0b should be enough. Please keep further discussion not
> related to fixing this bug on a forum or mailing list.
>

Revision history for this message

Jeremy LaCroix (jlacroix82-deactivatedaccount) wrote on 2009-02-26:

#36

There are a couple files that are 0b, so this bug is affecting me. Is
there any information I can provide to help the developers?

Wade Menard wrote:
> find / -size 0b should be enough. Please keep further discussion not
> related to fixing this bug on a forum or mailing list.
>

Revision history for this message

Michael Rooney (mrooney) wrote on 2009-02-26:

#37

Jeremy, as pablomme said: "there are very many files that have zero length
under normal conditions, so it'll be very hard to tell if any file has been
affected this way."

Many people are reporting trashed gnome sessions so it should be fairly
obvious whether it is or not. A 0b file is definitely not indicative of
this.

Revision history for this message

Jeremy LaCroix (jlacroix82-deactivatedaccount) wrote on 2009-02-26:

#38

The two files I have that are 0b are jpg images.

Michael Rooney wrote:
> Jeremy, as pablomme said: "there are very many files that have zero length
> under normal conditions, so it'll be very hard to tell if any file has been
> affected this way."
>
> Many people are reporting trashed gnome sessions so it should be fairly
> obvious whether it is or not. A 0b file is definitely not indicative of
> this.
>

Revision history for this message

kubrentu (brentkubuntu) wrote on 2009-02-26:

#39

Same data loss problem.

Installed Kubuntu Jaunty Alpha 4. ext4 as root / partition. Did all the updates.

$ uname -a
Linux andor 2.6.28-8-generic #26-Ubuntu SMP Wed Feb 25 04:28:54 UTC 2009 i686 GNU/Linux

ran the "write_stuff" script, and held down the power button after about 5-10 seconds.

brent@andor:~/test$ ls -l
total 4
-rw-r--r-- 1 brent brent 0 2009-02-26 13:38 file_1
-rw-r--r-- 1 brent brent 0 2009-02-26 13:38 file_2
-rw-r--r-- 1 brent brent 0 2009-02-26 13:38 file_3
-rw-r--r-- 1 brent brent 0 2009-02-26 13:38 file_4
-rw-r--r-- 1 brent brent 0 2009-02-26 13:38 file_5
-rw-r--r-- 1 brent brent 1411 2009-02-26 13:32 write_stuff

All 0B files.

I'm happy to try other tests that people may suggest.

Revision history for this message

3vi1 (launchpad-net-eternaldusk) wrote on 2009-02-27:

#40

Ack... had a power outage and ran into this one today too. Several configuration files from programs I was running ended up trashed. This also explains the corruption I've seen of my BOINC/SETI files when hard-rebooting in past weeks.

System: Linux mars 2.6.28-8-generic #26-Ubuntu SMP Wed Feb 25 04:27:53 UTC 2009 x86_64 GNU/Linux

I'm running RAID1 dmraid mirroring w/ an Asus Striker Formula II MB, in case it matters.

Dustin Kirkland  (kirkland) on 2009-03-07

Changed in ecryptfs-utils:
importance:	Undecided → High
status:	New → Invalid

Tim Gardner (timg-tpi) on 2009-03-13

Changed in linux:
milestone:	none → ubuntu-9.04-beta
status:	Confirmed → Fix Committed

Launchpad Janitor (janitor) on 2009-03-16

Changed in linux:
status:	Fix Committed → Fix Released

Revision history for this message

helios (martin-lichtvoll) wrote on 2009-03-26:

#188

Daniel Philipps, developer of Tux3 filesystem, wants to make sure that renames come after file being written even when delayed writing of metadata is introduced to it:
http://mailman.tux3.org/pipermail/tux3/2009-March/000829.html

Revision history for this message

Jamin W. Collins (jcollins) wrote on 2009-03-27:

#189

I know this report claims that a fix is already in Jaunty for this issue. However, I just found myself with a 0 byte configuration file after a system lockup (flashing caps lock).

$ uname -ra
Linux odin 2.6.28-11-generic #37-Ubuntu SMP Mon Mar 23 16:40:00 UTC 2009 x86_64 GNU/Linux

Revision history for this message

Theodore Ts'o (tytso) wrote on 2009-03-27:

#190

@189: Jamin,

The fix won't protect against a freshly written new file (configuration or otherwise); it only protects against a file which is replaced via rename or truncate. But if it was a file that previously didn't exist, then you can still potentially get a zero-length file --- just as you can crash just before the file was written out.

Revision history for this message

Jamin W. Collins (jcollins) wrote on 2009-03-27:

#191

@Theo
The file in question was a previously existing configuration file for my IM client (gajim). All IM accounts and preferences were lost. Not a huge deal, but definitely a preexisting file. The system kernel panicked (flashing caps lock) while chatting. The kernel panic is a separate issue that's been reported previously.

Revision history for this message

Rocko (rockorequin) wrote on 2009-03-27:

#192

@Theo: I vote for what (I think) lots of people are saying: if the file system delays writing of data to improve performance, it should delay renames and truncates as well so you don't get *complete* data loss in the event of a crash... Why have a journaled file system if it allows you to lose both the new *and* the old data on a crash rather than just the new data that couldn't be written in time?

It's true that this situation won't happen if the system never crashes, and it's great that this is true of your system - but in that case, why not just use ext2?

If ext3 also allows this, I'd say there's a problem with ext3 too.

Incidentally, I just ended up with a ton of trashed object files due to a kernel panic in the middle of a build. But I wouldn't say gcc is a crappy application!

PS. Other than this bug, ext4 rocks.

Revision history for this message

Theodore Ts'o (tytso) wrote on 2009-03-27:

#193

@Rocko,

If you really want this, you can disable delayed allocation via the mount option, "nodelalloc". You will take a performance hit and your files will be more fragmented. But if you have applications which don't call fsync(), and you have an unstable system, then you can use the mount option. All I can say is that I don't see these data loss problems, but everyone has different usage patterns.

In terms of trashed object files in the middle of the build, those object files are non-precious files. How often do you crash in the middle of a build? Should you slow down all builds just to handle the rare case where your system crashes in the middle of the build? Or would it be better to run "make clean", and rebuild the tree in the case where you have trashed object files? It's not like a kernel rebuild takes that long. OTOH, if your system is crashing all the time, there's something else seriously wrong; Linux systems shouldn't be that unstable.

Revision history for this message

Theodore Ts'o (tytso) wrote on 2009-03-27:

#194

@Jamin,

We'd have to see how gaim is rewriting the application file. If it is doing open/truncate/write/close, there will always be the chance that the file would be lost if you crash right after the truncate. This is true with both ext3 and ext4. With the workaround, the chances of losing the file with ext4 when the application tries to do the fundamentally broken replace-via-truncate are the same as with ext3. We can't do better than that.

Revision history for this message

Jamin W. Collins (jcollins) wrote on 2009-03-27:

#195

@Theo,

Been digging through the source to track down how it does it. Managed to find it. It does use a central consistent method, which does use a tempfile. However, it does not (as of yet) force a sync. I'm working on getting that added to the code now. Here's the python routine it uses:

self.__filename: the full path to the user's configuration file.
self.__tempfile: the same path and filename but with a dot prefix

def write(self):
  (base_dir, filename) = os.path.split(self.__filename)
  self.__tempfile = os.path.join(base_dir, '.' + filename)
  try:
   f = open(self.__tempfile, 'w')
  except IOError, e:
   return str(e)
  try:
   gajim.config.foreach(self.write_line, f)
  except IOError, e:
   return str(e)
  f.close()
  if os.path.exists(self.__filename):
   # win32 needs this
   try:
    os.remove(self.__filename)
   except Exception:
    pass
  try:
   os.rename(self.__tempfile, self.__filename)
  except IOError, e:
   return str(e)
  os.chmod(self.__filename, 0600)

Revision history for this message

Aryeh Gregor (simetrical+launchpad) wrote on 2009-03-27:

#196

That looks like it removes the file before it does the rename, so it misses the special overwrite-by-rename workaround. This is slightly unsafe on any filesystem, since you might be left with no config file with the correct name if the system crashes in a small window, fsync() or no. Seemingly Python on Windows doesn't support an atomic rename operation at all.

It might be simplest for it to only do the remove if rename throws an OSError, or only if the platform is Windows. Ideally it should call fsync() as well, of course.

Revision history for this message

Daniel Colascione (dcolascione) wrote on 2009-03-27:

#197

What that code does is stupid, yes. It shouldn't remove the original unless the platform is win32. *Windows* (except with Transactional NTFS) doesn't support an atomic rename, so it's no surprise that Python under Windows doesn't either.

You're seeing a zero-length file because Tso's fix for ext4 only applies to files being renamed on top of other files. The filesystem should be fixed to allocate blocks on *every* commit, not just ones overwriting existing files.

As for the program -- fsync should *not* be inserted. (Though the unconditional os.remove() should be changed.) It's a bad thing to ritually fsync every file before the rename for a host of reasons described upthread. Just fix the filesystem.

Revision history for this message

Theodore Ts'o (tytso) wrote on 2009-03-27:

#198

@Daniel,

Note that if you don't call fsync(), and you hence you don't check the error returns from fsync(), your application won't be notified about any possible I/O errors. So that means if the new file doesn't get written out due to media errors, the rename may also end up wiping out the existing file. This can be an issue with some remote file systems, like AFS, where you'll miss quota errors unless you fsync() and check the error returns on both fsync() and close(). But hey, if you don't care about that, feel free to write your applications any way you want.

Revision history for this message

Aryeh Gregor (simetrical+launchpad) wrote on 2009-03-27:

#199

"The filesystem should be fixed to allocate blocks on *every* commit, not just ones overwriting existing files."

alloc_on_commit mode has been added. Those who want to use it (and take the large associated performance hit) can use it. It's a tradeoff that is and should be in the hands of the individual system administrator. Personally, my machine almost never crashes, so I'd prefer the extra performance.

What the application is doing in this case is broken anyway, and if it fixed that there would be no problem on ext4.

"As for the program -- fsync should *not* be inserted. (Though the unconditional os.remove() should be changed.) It's a bad thing to ritually fsync every file before the rename for a host of reasons described upthread."

fsync() should preferably be used for config file updates, assuming those are reasonably rare, "for a host of reasons described upthread". Otherwise, the user will click "Save" and then the preference change won't actually take effect if the system crashes shortly thereafter. This is true in any filesystem. On some filesystems (not just ext4: XFS certainly, maybe NFS?), you might also get some kind of other bad stuff happening. Explicit user saving of files/preferences/etc. should therefore invoke an fsync() in most cases: you want to make sure the change is committed to stable storage before giving confirmation to the user that it's saved. Text editors already do this, and no one seems to have complained.

If Gaim updates its config file very often for some reason, though, they'd have to weigh the added reliability of fsync() against the performance hit (especially on filesystems like ext3).

Revision history for this message

Daniel Colascione (dcolascione) wrote on 2009-03-27:

#200

If you accept that it makes sense to allocate on rename commits for overwrites of *existing* files, it follows that it makes sense to commit on *all* renames. Otherwise, users can still see zero-length junk files when writing a file out for the first time. If an application writes out a file using the atomic rename technique, it should expect just as good a consistency guarantee when the file doesn't already exist as when it does. Anything else just adds extra complexity.

Before your knee jerks out "performance," consider that brand-new, throwaway files aren't renamed. gcc doesn't write a file out, only to rename it immediately. Only files for which atomicty matters are renamed that way -- which are precisely the files that would get the commit-on-rename treatment in other circumstances. The performance impact of committing on *all* renames would be minimal over the existing rename code.

We keep talking in circles: if you're going to make a commitment to application reliability, go all the way and commit on all renames. Anything else is just a subtle gotcha for application programs. Yes, POSIX them harder, will you?

NFS is a special case in that 1) it's widely known to have strange semantics, and 2) many applications explicitly don't support NFS for that reason. NFS semantics are *not* the ones we should be striving to emulate! Besides, the kind of inconsistency you see with NFS doesn't result in corrupt configurations in the same way the ext4 bug does.

As for AFS: it has a special place in Hell. AFS doesn't even meet basic POSIX guarantees with regard to permissions. Its mind-bendingly stupid quota behavior is just icing on the cake. It's crap as a unix filesystem, and I sure as hell wouldn't consider using it except on a specially-prepared system. I'm not going to make my application jump through hoops to support your antiquated hack. Every other filesystem checks quotas on write and close; why does yours have to be different?

Revision history for this message

Aryeh Gregor (simetrical+launchpad) wrote on 2009-03-27:

#201

"If you accept that it makes sense to allocate on rename commits for overwrites of *existing* files, it follows that it makes sense to commit on *all* renames."

Renaming a new file over an existing one carries the risk of destroying *old* data. If I create a new file and don't rename it to anything, it's possible I will lose *the new file only*, on any filesystem (unless I fsync()). This is universally considered an acceptable risk: losing up to a couple of minutes' work (but nothing earlier) in the event of a system crash. This is the exact risk carried by renaming a file to a name that doesn't exist -- unless you gratuitously delete the old file first, which is completely pointless on Unix and obviously destroys any hope of atomicity (if the system crashes/app dies/etc. between delete and rename).

"Only files for which atomicty matters are renamed that way -- which are precisely the files that would get the commit-on-rename treatment in other circumstances."

Virtually all users of this atomicity technique appear to rename over the existing file, which is why almost all problems disappeared when users applied Ted's patches. Gaim only did otherwise as a flawed attempt to work around a quirk of the Windows API, in a way that wasn't atomic anyway, and that can be expected to be fixed in Gaim.

Revision history for this message

Daniel Colascione (dcolascione) wrote on 2009-03-27:

#202

The risk isn't data loss; if you forgo fsync, you accept the risk of some data loss. The issue that started this whole debate is consistency.

The risk here is of the system ending up in an invalid state with zero-length files *THAT NEVER APPEARED ON THE RUNNING SYSTEM* suddenly cropping up. A zero-length file in a spot that is supposed to be occupied by a valid configuration file can cause problems --- an absent file might indicate default values, but an empty file might mean something completely different, like a syntax error or (famously) "prevent all users from logging into this system."

When applications *really* do is create a temporary file, write data to it, and rename that temporary file to its final name regardless of whether the original exists. If the filesystem doesn't guarantee consistency for a rename to a non-existing file, the application's expectations will be violated in unusual cases causing hard-to-discover bugs.

Why should an application that atomically updates a file have to check whether the original exists to get data consistency?

Allocate blocks before *every* rename. It's a small change from the existing patch. The performance downsides are minimal, and making this change gives applications the consistency guarantees they expect.

Again: if you accept that you can give applications a consistency guarantee when using rename to update the contents of a file, it doesn't make sense to penalize them the first time that file is updated (i.e., when it's created.) Unless, of course, you just want to punish users and application developers for not gratuitously calling fsync.

Revision history for this message

Chow Loong Jin (hyperair) wrote on 2009-03-28:

#203

On Fri, 2009-03-27 at 22:55 +0000, Daniel Colascione wrote:
> The risk isn't data loss; if you forgo fsync, you accept the risk of
> some data loss. The issue that started this whole debate is consistency.
>
> The risk here is of the system ending up in an invalid state with zero-
> length files *THAT NEVER APPEARED ON THE RUNNING SYSTEM* suddenly
> cropping up. A zero-length file in a spot that is supposed to be
> occupied by a valid configuration file can cause problems --- an absent
> file might indicate default values, but an empty file might mean
> something completely different, like a syntax error or (famously)
> "prevent all users from logging into this system."
A syntax error usually prevents the whole program from running, I should
think. And I'm not sure about the whole "prevent all users from logging
into this sytem" bit. I've never even heard of it, so I don't know how
you can consider that famous.

> When applications *really* do is create a temporary file, write data to
> it, and rename that temporary file to its final name regardless of
> whether the original exists. If the filesystem doesn't guarantee
> consistency for a rename to a non-existing file, the application's
> expectations will be violated in unusual cases causing hard-to-discover
> bugs.
It is guaranteed. When you *rename onto an existing file*. If you delete
the original *before* renaming, then I see it as "you have agreed to
forgo your atomicity".
>
> Why should an application that atomically updates a file have to check
> whether the original exists to get data consistency?
Um, no, I don't think it needs to. See this:
Case 1: File already exists.
1. Application writes to file.tmp
2. Application closes file.tmp
3. Application renames file.tmp to file.
** If a crash happens, you either get the original, or the new.

Case 2: File doesn't already exist.
1-3 as above.
** If a crash happens, you either get the new file, or a zero-length
file.

Considering that in case 2 there wasn't a file to begin with, I don't
think it's much of an issue in getting a zero-length file. Unless your
program crashes when you get zero-length configuration files, in which
case I think your program sucks and you suck for writing it with that
assumption.

>
> Allocate blocks before *every* rename. It's a small change from the
> existing patch. The performance downsides are minimal, and making this
> change gives applications the consistency guarantees they expect.
I wholeheartedly agree with "Allocate blocks before renames over
existing files", but "Allocate blocks before *every* rename" is
overdoing it a little.
>
> Again: if you accept that you can give applications a consistency
> guarantee when using rename to update the contents of a file, it doesn't
> make sense to penalize them the first time that file is updated (i.e.,
> when it's created.) Unless, of course, you just want to punish users and
> application developers for not gratuitously calling fsync.
Again, I don't see exactly how an application is being penalized the
first time the file is updated.

--
Chow Loong Jin

On Fri, 2009-03-27 at 22:55 +0000, Daniel Colascione wrote:
> The risk isn't data loss; if you forgo fsync, you accept the risk of
> some data loss. The issue that started this whole debate is consistency.
> 
> The risk here is of the system ending up in an invalid state with zero-
> length files *THAT NEVER APPEARED ON THE RUNNING SYSTEM* suddenly
> cropping up. A zero-length file in a spot that is supposed to be
> occupied by a valid configuration file can cause problems --- an absent
> file might indicate default values, but an empty file might mean
> something completely different, like a syntax error or (famously)
> "prevent all users from logging into this system."
A syntax error usually prevents the whole program from running, I should
think. And I'm not sure about the whole "prevent all users from logging
into this sytem" bit. I've never even heard of it, so I don't know how
you can consider that famous.

> When applications *really* do is create a temporary file, write data to
> it, and rename that temporary file to its final name regardless of
> whether the original exists. If the filesystem doesn't guarantee
> consistency for a rename to a non-existing file, the application's
> expectations will be violated in unusual cases causing hard-to-discover
> bugs.
It is guaranteed. When you *rename onto an existing file*. If you delete
the original *before* renaming, then I see it as "you have agreed to
forgo your atomicity".
> 
> Why should an application that atomically updates a file have to check
> whether the original exists to get data consistency?
Um, no, I don't think it needs to. See this:
Case 1: File already exists.
1. Application writes to file.tmp
2. Application closes file.tmp
3. Application renames file.tmp to file.
** If a crash happens, you either get the original, or the new.

Case 2: File doesn't already exist.
1-3 as above.
** If a crash happens, you either get the new file, or a zero-length
file.

Considering that in case 2 there wasn't a file to begin with, I don't
think it's much of an issue in getting a zero-length file. Unless your
program crashes when you get zero-length configuration files, in which
case I think your program sucks and you suck for writing it with that
assumption.

> 
> Allocate blocks before *every* rename. It's a small change from the
> existing patch. The performance downsides are minimal, and making this
> change gives applications the consistency guarantees they expect.
I wholeheartedly agree with "Allocate blocks before renames over
existing files", but "Allocate blocks before *every* rename" is
overdoing it a little.
> 
> Again: if you accept that you can give applications a consistency
> guarantee when using rename to update the contents of a file, it doesn't
> make sense to penalize them the first time that file is updated (i.e.,
> when it's created.) Unless, of course, you just want to punish users and
> application developers for not gratuitously calling fsync.
Again, I don't see exactly how an application is being penalized the
first time the file is updated.

-- 
Chow Loong Jin

Revision history for this message

Daniel Colascione (dcolascione) wrote on 2009-03-28:

#204

First of all, the program under discussion got it wrong. It shouldn't have unlinked the destination filename. But the scenario it unwittingly created is *identical* to the first-time creation of a filename via a rename, and that's a very important case. EVERY program will encounter it the first time it creates a file via an atomic rename. If the system dies at the wrong time, the program will see a zero-length file in place of the one it just wrote.

This is your scenario two. This is *NOT* about data loss. If the program cared about data loss, it'd use fsync(), dammit. This is about consistent state.

The program didn't put that zero-length file there. Why should it be expected to handle it? It's perfectly reasonable to barf on a zero-length file. What if it's XML and needs a root element? What if it's a database that needs metadata? It's unreasonable to expect every program and library to be modified to not barf on empty files *it didn't write* just like it's unreasonable to modify every program to fsync gratuitously. Again -- from the point of view of the program on a running system, there was at *NO TIME* a zero-length file. Why should these programs have to deal with them mysteriously appearing after a crash?

Okay, and now what about XFS? XFS fills files with NULL instead of truncating them down to zero length (technically, it just makes the whole file sparse, but that's beside the point.) Do programs need to specially handle the full-of-NULLs case too? How many hoops will they have to go through just to pacify sadistic filesystems?

A commit after every rename has a whole host of advantages. It rounds out and completes the partial guarantee provided by a commit after an overwriting rename. It completely prevents the appearance of a garbage file regardless of whether a program is writing the destination for the first or the nth time. It prevents anyone from having to worry about garbage files at all.

It's far better to fix a program completely than to get it right 99% of the time and leave a sharp edge hiding in some dark corner. Just fix rename.

And what's the downside anyway? High-throughput applications don't rename brand-new files after they've just created them anyway.

As for no users being able to log in -- I was referring to an old BSD network daemon. But for a more modern example, how about cron.deny? If cron.deny does not exist, only root can use cron. If cron.deny exists *AND IS EMPTY*, all users can use cron.