[regression] dpkg's fsync causes massive regression in Ubuntu Server and Alternate installation times

Bug #570805 reported by Dustin Kirkland 
200
This bug affects 43 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Medium
Release Notes for Ubuntu
Fix Released
Undecided
Colin Watson
dpkg (Ubuntu)
Fix Released
High
Unassigned
Lucid
Fix Released
High
Colin Watson
dpkg (Unity Linux)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: dpkg

dpkg (1.15.5.6ubuntu4) causes a massive regression in the installation of the Ubuntu Server.

Specifically, this from the changelog, addressing Bug #559915:

    - Restore fsync during package unpack (LP: #559915). This is now done
      by deferring the fsync and rename for normal files in tar extraction
      so that it's done in one pass afterwards, to avoid massive I/O
      degradation due to the serialization from each write + fsync. When
      creating hard links to normal files on extraction use the .dpkg-new
      filename for source as the file is not yet in place due to the rename
      deferral.

I just installed the same hardware, from the same USB stick, in an identical configuration, one on ext4 and the other on ext3.

On ext4, this took 19 minutes, 20 seconds. On ext3, this took 9 minutes, 8 seconds.

This is a 100% performance hit on Server installs. It takes now takes over twice as long to install Ubuntu servers.

While I can understand that the fsyncs() are necessary for power loss situations on apt-get upgrade/dpkg operations *after* the system has been installed, it should not be necessary at Server install time. If you lose power during a d-i installation, you will clearly need to start from scratch anyway.

The desktop installer does not suffer from this since ubiquity installations simply transfer the live image.

DEVELOPMENT BRANCH: Addressed in a merge from Debian by using one large sync before embarking on renames, rather than lots of little fsyncs.

PATCH: http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/lucid/dpkg/lucid/revision/95

TEST CASE: Install Ubuntu Server; measure time taken in "Select and install software" step. Compare 10.04 as released to the 10.04.1 candidate CD images which should be available for testing in the near future.

Changed in dpkg (Ubuntu):
importance: Undecided → High
status: New → Triaged
assignee: nobody → Thierry Carrez (ttx)
tags: added: regression-potential
Revision history for this message
Thierry Carrez (ttx) wrote :

This would need to be workarounded in dpkg, to make it support a without-fdisk mode that would be triggered on new installs only.

I talked to the release team about it, their decision is to releasenote this issue and work to get it fixed for 10.04.1.

Changed in dpkg (Ubuntu Lucid):
assignee: Thierry Carrez (ttx) → nobody
Revision history for this message
Thierry Carrez (ttx) wrote :

Performance hit also affects alternate/netboot installs, as well as any upgrades (including Desktop)

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [regression] dpkg fsync cause massive regression in Ubuntu Server and Alternate installation times

Thierry-

Regarding upgrades, I think I agree with the current design. Data integrity on a running system is of paramount importance.

New installs, though, are a different beast, in my opinion, since a failure would necessitate a reinstall anyway.

summary: - [regression] dpkg fsync cause massive regression in Ubuntu Server
- installation times
+ [regression] dpkg fsync cause massive regression in Ubuntu Server and
+ Alternate installation times
Revision history for this message
Phillip Susi (psusi) wrote :

This appears to be a duplicate of bug #537241.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Phillip-

Thanks for the link.

The cause, is in fact, the same (fsync's in dpkg).

The solution, though, might be subtlety different. As I've said above, I can understand and agree with the change in behavior for updates/upgrades on a running system.

This current bug, though, is about what's going on in the installer on fresh installs, when you have no critical data yet on the system. I'm hoping this bug will be fixed separately from bug #537241.

tags: added: iso-testing
Revision history for this message
Colin Watson (cjwatson) wrote :

I've release-noted this as follows (feel free to tweak from here):

== Default file system; package manager performance ==

The default file system for installations of Ubuntu 10.04 LTS is `ext4`, the latest version in the popular series of Linux extended file systems. `ext4` includes a number of performance tuning changes relative to previous versions such as `ext3`, the file system used by default up to Ubuntu 9.04. These generally produce improvements, but some particular workloads are known to be significantly slower when using `ext4` than when using `ext3`. If you have performance-sensitive applications, we recommend that you run benchmarks using multiple file systems in your environment and select the most appropriate.

In particular, the `dpkg` package manager is known to run significantly slower on `ext4` (causing installations using the server or alternate install CD to take on the order of twice as long as before). `ext4` does not guarantee atomic renames of new files over existing files in the event of a power failure shortly after the rename, and so `dpkg` needs to force the contents of the new file out to disk before renaming it in order to avoid leaving corrupt zero-length files after power failures. This operation involves waiting for the disk significantly more than it strictly needs to, and so degrades performance. If fast package management operations are most important to you, then you should use `ext3` instead. (Bug:570805)

The simplest way to select a different file system such as `ext3` at installation time is to add the `partman/default_filesystem=ext3` boot parameter when starting the installer. If you are deploying Ubuntu automatically using Kickstart or preseeding, then you can set a different file system in the partitioning recipe instead.

Changed in ubuntu-release-notes:
assignee: nobody → Colin Watson (cjwatson)
status: New → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 570805] Re: [regression] dpkg fsync cause massive regression in Ubuntu Server and Alternate installation times

Looks great. Very informative, Colin.

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote : Re: [regression] dpkg fsync cause massive regression in Ubuntu Server and Alternate installation times

I second that, i love to see some background info!

But it looks like either the release notes or the ext4 documenation needs updating:

===
Release notes
===
`ext4` does not guarantee atomic renames of
new files over existing files in the event of a
power failure shortly after the rename

===
ext4 docs ( http://www.mjmwired.net/kernel/Documentation/filesystems/ext4.txt#338 )
===
the data blocks of the new file are forced
to disk before the rename() operation is
committed. This provides roughly the same level
of guarantees as ext3, and avoids the
"zero-length" problem that can happen when a
system crashes before the delayed allocation
blocks are forced to disk.

Revision history for this message
Dmitry Potapov (dpotapov) wrote :

The explanation in the release notes is confusing. First of all, as Jakob wrote above, ext4 starting with 2.6.30 provides similar behavior to ext3 with data=ordered. So, it is not clear why fsync() is necessary on ext4, but not on ext3.

More importantly, the default mode for ext3 was changed to data=writeback in 2.6.30, which does not provide the same guarantee as "ordered". Even if the default was changed to be "ordered" in Ubuntu kernel, users still may use an upstream kernel or change the mode to "writeback" on ext3, because the "ordered" mode has horrible latency (20 times or more than "writeback"). Does it mean that anyone running ext3 with data=writeback can face corruption of their package repository?

Revision history for this message
Colin Watson (cjwatson) wrote :

Jean-Baptists Lallement tested this behaviour and found that ext4's guarantee seems to be ... not so much of a guarantee in reality. I asked him to file it on bugzilla.kernel.org.

Revision history for this message
Colin Watson (cjwatson) wrote :

Sorry, "Jean-Baptiste Lallement"

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote : Re: [Bug 570805] Re: [regression] dpkg fsync cause massive regression in Ubuntu Server and Alternate installation times

Am 05/05/10 15:53, schrieb Jean-Baptiste Lallement:
> ** Bug watch added: Linux Kernel Bug Tracker #15910
> http://bugzilla.kernel.org/show_bug.cgi?id=15910
>
> ** Also affects: linux via
> http://bugzilla.kernel.org/show_bug.cgi?id=15910
> Importance: Unknown
> Status: Unknown

I wonder if this says what you meant to say ( from the kernel.org bug ):
   ---
   auto_da_alloc doesn't detect the replace-via-rename (at least in the
   case of dpkg.)
   ---
Is dpkg really ***replacing***-via-rename when installing new packages?
Where do the old files that are overwritten come from?

Thanks!

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote : Re: [regression] dpkg fsync cause massive regression in Ubuntu Server and Alternate installation times

Note that for the install case where corruption does not matter, a
export LD_PRELOAD=./libeatmydata.so
should bring performance back to normal (or even above) in a very simple way. See http://www.flamingspork.com/projects/libeatmydata/

Revision history for this message
Colin Watson (cjwatson) wrote :

I would appreciate it if people affected by this bug could test the dpkg package from:

  https://launchpad.net/~cjwatson/+archive/ppa

This is probably easier to test if you're seeing problems with dpkg performance after installation, although I do plan to hack it into an installation environment after I get back from UDS and do some comparisons. Please report before-and-after timings if you can, preferably with a cold cache.

If this makes a significant improvement, I'll get it into Ubuntu 10.04.1.

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 570805] Re: [regression] dpkg fsync cause massive regression in Ubuntu Server and Alternate installation times

Colin-

Poke me when you have an install media ready to test. That's my
primary use case, and I'll gladly help test that.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [regression] dpkg fsync cause massive regression in Ubuntu Server and Alternate installation times

I ran some tests on my Dell Latitude D430, which has ext4 and a ridiculously slo
w hard disk. I tested with:

$ echo 3 | sudo tee /proc/sys/vm/drop_caches; sudo dpkg -i /var/cache/apt/archives/openoffice.org-common_1%3a3.2.0-7ubuntu4_all.deb | ts

old dpkg:
openoffice.org-common: 56s unpack, 25s config/triggers
ubuntu-docs: 25s unpack, 4s config

Colin's PPA dpkg:
openoffice.org-common: 23s unpack, 23s config/triggers
ubuntu-docs: 23s unpack, 4s config

So it makes quite a difference for the unpacking of openoffice.org-common (2860 files, huge), and not so much for ubuntu-docs (7454 files, smaller)

Revision history for this message
Jakob Unterwurzacher (jakobunt) wrote :

Test on ext3 (Karmic upgraded to Lucid), 5400rpm notebook hd. Note that i think in addition to dropping the caches, we have to sync before running the test to get somewhat stable results.

Commands used (as root):
  sync; echo 3 > /proc/sys/vm/drop_caches; time dpkg -i /var/cache/apt/archives/openoffice.org-common_1%3a3.2.0-7ubuntu4_all.deb | ts
  sync; echo 3 > /proc/sys/vm/drop_caches; time dpkg -i /var/cache/apt/archives/ubuntu-docs_10.04.3_all.deb | ts

Results, old dpkg (1.15.5.6ubuntu4):
  openoffice.org-common: 26s unpack, 93s real
  ubuntu-docs: 24s unpack, 51s real

Results, Colin's PPA dpkg (1.15.5.6ubuntu5~ppa1):
  openoffice.org-common: 31s unpack, 94s real
  ubuntu-docs: 23s unpack, 54s real

So ext3 doesn't care at all - it's not regressing at least.

Revision history for this message
Mark Dammer (mark-dammer) wrote :

I wonder if my problem is related to this bug - I am observing excessive harddisk activity in the unpack phase of package installations on all three Lucid systems:
https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/580537

Revision history for this message
Mark Dammer (mark-dammer) wrote :

dpkg from Colin Watsons PPA solved the problem: I installed "Blender" and the unpacking phase of the 10Mb package took less than a second. And I experience no more rattling disks when installing packages.

Revision history for this message
Bela Lubkin (filbo) wrote :

What benefit is provided by all the fsync action?

Without it, a power-failed install may have zero-length or wrong-content files.

With it, a power-failed install still has a broken package -- each individual file may be fully there or fully not-there, but there will be missing files.

It doesn't help achieve a successful package install. In fact, it _increases_ vulnerability to power failures by making the vulnerable time window more than twice as long (all of the added time is vulnerable time, while some of the original time must be safe prep time).

Either way, system powers back on with a broken package. Either way, the user or the dpkg system must deal with it.

dpkg _should_ do a regular sync() after each package; I imagine (without checking source) that it already does. It already has notes on which packages were in transition. Make sure _those_ are fully sync'd, fsync'd if that's the right way to do it -- those tell dpkg where to pick up, which package to fix, after the power cycle.

I think this code should be retracted, even for normal post-install package installs. Cleanup is going to be needed after a mid-install power failure either way; don't make users suffer through slow, noisy, HD-punishing package installs for no [or negative] benefit.

Revision history for this message
Bela Lubkin (filbo) wrote :

Ok, looked at Colin's backport:

It's the right way to do it; and I suspect this is true for all Unixish OSes with all filesystems, whether or not they have synchronous sync().

Revision history for this message
Roel van Os (roel-van-os) wrote :

Using the testing method from #17 for the package oxygen-icon-theme (5890 mostly small files), tested on a 5400 RPM laptop harddisk formatted with ext4.

  dpkg 1.15.5.6ubuntu4:
    unpack: 37s
    real: 53s

  dpkg 1.15.5.6ubuntu5~ppa1:
    unpack: 5s
    real: 22s

Also tested on a RAID 0 array consisting of two 7200 RPM desktop drives, formatted with XFS:

  dpkg 1.15.5.6ubuntu4:
    unpack: 56s
    real: 53s

  dpkg 1.15.5.6ubuntu5~ppa1:
    unpack: 13s
    real: 21s

So quite a noticable difference in both cases :-)

summary: - [regression] dpkg fsync cause massive regression in Ubuntu Server and
+ [regression] dpkg's fsync causes massive regression in Ubuntu Server and
Alternate installation times
Revision history for this message
Colin Watson (cjwatson) wrote :

Guillem's patch is in Maverick now (1.15.7.2 and newer). I've uploaded 1.15.5.6ubuntu4.1 to lucid-proposed, awaiting approval.

Changed in dpkg (Ubuntu):
status: Triaged → Fix Released
Colin Watson (cjwatson)
Changed in dpkg (Ubuntu Lucid):
assignee: nobody → Colin Watson (cjwatson)
status: Triaged → In Progress
description: updated
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted dpkg into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in dpkg (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Kostas Peletidis (kpeletidis) wrote :

In Xubuntu 10.04 LTS the software update tool complains that the proposed dpkg package could not be authenticated. Is this expected at this stage? Thank you.

Revision history for this message
Kostas Peletidis (kpeletidis) wrote :

Regarding the authentication issue. I switched to the man update server from the mirror I usually use and now there are no package authentication problems. So far version 1.15.5.6ubuntu2 works fine.

Revision history for this message
Kostas Peletidis (kpeletidis) wrote :

Correction: I meant *main* update server.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

10.04.1 candidate images are not yet available.

I did, however, measure the performance difference on a running system.

 1) Install Ubuntu 10.04 Server
 2) sudo apt-get update
 3) sudo apt-get dist-upgrade --download-only
     (32 updates available now)
 4) time sudo apt-get dist-upgrade
      takes 3m33s

Then:
 1) Install Ubuntu 10.04 Server
 2) Upgrade dpkg to -proposed
 3) sudo apt-get update
 4) sudo apt-get dist-upgrade --download-only
     (32 updates available now)
 5) time sudo apt-get dist-upgrade
      takes 1m52s

There's clearly a huge performance improvement. I would like to see this promoted from -proposed to -updates, and would very much like to test this on a real ISO whenever that's available.

Revision history for this message
Martin Pitt (pitti) wrote :

I also have used the proposed package extensively in the past weeks, and did not notice any problems. Together with Dustin's testing I consider this verified.

tags: added: regression-release verification-done
removed: regression-potential verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpkg - 1.15.5.6ubuntu4.1

---------------
dpkg (1.15.5.6ubuntu4.1) lucid-proposed; urgency=low

  * Backport proposed patch from Guillem Jover:
    - On Linux, call sync() (which is synchronous) before rename() rather
      than calling fsync() once per file (LP: #570805).
 -- Colin Watson <email address hidden> Mon, 28 Jun 2010 14:32:02 +0100

Changed in dpkg (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
wenhui (wenhui618) wrote :

when use ext4, unpack package is still unbelievable slow in netboot installer,
I use netboot.tar.gz from here:
http://archive.ubuntu.com/ubuntu/dists/lucid-proposed/main/installer-amd64/current/images/netboot/

Revision history for this message
Ferdinand Hagethorn (ferdinand-hagethorn) wrote :

using maverick and btrfs, dpkg is as slow as a dead horse and sometimes completely freezes, manually running the sync command seems to pull it out of its dead lock...

Changed in linux:
status: Unknown → Confirmed
D-lyte (shady-d-lyte)
security vulnerability: no → yes
Martin Pitt (pitti)
security vulnerability: yes → no
Changed in linux:
importance: Unknown → Medium
Phillip Susi (psusi)
Changed in dpkg (Unity Linux):
status: New → Invalid
Changed in linux:
status: Confirmed → Invalid
tags: added: testcase
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.