DRBD 8.0.11 is unusably slow

Bug #288226 reported by Florian Hackenberger
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Hardy
Invalid
Medium
Unassigned

Bug Description

I'm using DRBD as a backend for postgre in a two node HA setup and I'm
experiencing severe slowdowns. An analysis follows below. All results
have been obtained with linuxHA being off and nothing resource
intensive running on the servers. I have executed each benchmark
several times (at least 3 times each) to make sure I'm not falling prey
to statistic outliers.

Upgrading to 8.0.13 (I rebuilt the ubuntu package with the new upstream sources) and setting the no-disk-flushes and no-md-flushes solved the problem. The write speed for 1000 512byte chunks is running at about 3.3MB/s. The 8.0.11 version does not support those options, which leaves people with server hardware (a battery backed write cache is REQUIRED to enable these options) with an unusably slow DRBD setup. The upgrade to 8.0.13 is very easy and it is a bugfix only release.

Please note the following two performance figures below:
Speed for: sudo dd if=/dev/zero of=/dev/drbd0 bs=512
Disconnected: 3.5 MB/s
Connected: 3.5 kB/s

= Network latency =
asterix02@asterix02:/usr/lib/lmbench/bin/i686-pc-linux-gnu$ ./lat_tcp
192.168.1.1
TCP latency using 192.168.1.1: 0.2479 microseconds

asterix01@asterix01:/usr/lib/lmbench/bin/i686-pc-linux-gnu$ ./lat_tcp
192.168.1.2
TCP latency using 192.168.1.2: 0.2463 microseconds

= Network throughput =
asterix01@asterix01:/usr/lib/lmbench/bin/i686-pc-linux-gnu$ iperf -f
M -c 192.168.1.2
------------------------------------------------------------
Client connecting to 192.168.1.2, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.1 port 39381 connected with 192.168.1.2 port 5001
[ 3] 0.0-10.0 sec 1116 MBytes 112 MBytes/sec

= Local disk (IBM Serveraid RAID 1) =
asterix01@asterix01:/$ sudo dd if=/dev/zero of=/dev/sda6 bs=1G count=1
oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 9.97766 s, 108 MB/s

asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/sda6 bs=1G count=1
oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 9.84385 s, 109 MB/s

asterix01@asterix01:/$ sudo dd if=/dev/zero of=/dev/sda6 bs=512
count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.136389 s, 3.8 MB/s

asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/sda6 bs=512
count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.140179 s, 3.7 MB/s

= DRBD default configuration =
asterix02@asterix02:/$ sudo drbdsetup /dev/drbd0 show
disk {
        size 0s _is_default; # bytes
        on-io-error detach;
        fencing dont-care _is_default;
}
syncer {
        rate 33792k; # bytes/second
        after -1 _is_default;
        al-extents 127 _is_default;
}
_this_host {
        device "/dev/drbd0";
        disk "/dev/sda5";
        meta-disk internal;
}

== Disconnected DRBD ==
asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/drbd0 bs=1G count=1
oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 43.7656 s, 24.5 MB/s

asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/drbd0 bs=512
count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.14615 s, 3.5 MB/s

== Connected DRBD (no resync happening) ==
asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/drbd0 bs=1G count=1
oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 53.9678 s, 19.9 MB/s

asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/drbd0 bs=512
count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 144.54 s, 3.5 kB/s

= Optimised DRBD =
disk {
        size 0s _is_default; # bytes
        on-io-error detach;
        fencing dont-care _is_default;
}
net {
        timeout 20; # 1/10 seconds
        max-epoch-size 2048 _is_default;
        max-buffers 8192;
        unplug-watermark 8192;
        connect-int 10 _is_default; # seconds
        ping-int 1; # seconds
        sndbuf-size 131070 _is_default; # bytes
        ko-count 0 _is_default;
        after-sb-0pri disconnect _is_default;
        after-sb-1pri disconnect _is_default;
        after-sb-2pri disconnect _is_default;
        rr-conflict disconnect _is_default;
        ping-timeout 5 _is_default; # 1/10 seconds
}
syncer {
        rate 33792k; # bytes/second
        after -1 _is_default;
        al-extents 2129;
}
protocol C;
_this_host {
        device "/dev/drbd0";
        disk "/dev/sda5";
        meta-disk internal;
        address 192.168.1.2:7788;
}
_remote_host {
        address 192.168.1.1:7788;
}

asterix02@asterix02:/$ cat /proc/drbd
version: 8.0.11 (api:86/proto:86)
GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by phil@mescal,
2008-02-12 11:56:43
 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:1048576 nr:0 dw:32129175 dr:66621614 al:2934 bm:578 lo:0 pe:0
ua:0 ap:0
        resync: used:0/31 hits:196447 misses:193 starving:0 dirty:0
changed:193
        act_log: used:0/2129 hits:218622 misses:256 starving:0 dirty:0
changed:256

== Disconnected DRBD ==
asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/drbd0 bs=1G count=1
oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 8.48373 s, 127 MB/s

asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/drbd0 bs=512
count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.145206 s, 3.5 MB/s

== Connected DRBD (no resync happening) ==
asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/drbd0 bs=1G count=1
oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 20.0519 s, 53.5 MB/s

asterix02@asterix02:/$ sudo dd if=/dev/zero of=/dev/drbd0 bs=512
count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 144.431 s, 3.5 kB/s

Revision history for this message
Chuck Short (zulcss) wrote :

Unfortunately upgrading drbd8 is not an option for a LTS due to the fact that hardy is a long term release. However intrepid has 8.2.6 and you can possibly ask for a backport.

Regards
chuck

Changed in drbd8:
status: New → Won't Fix
Revision history for this message
Mark Foster (fostermarkd) wrote :

Sounds like a blow off. Where is this policy documented?

I don't see why a relatively minor upgrade from 8.0.11 to 8.0.13 would not be warranted if this bug is verified.

Revision history for this message
Florian Hackenberger (f-hackenberger) wrote :

Chuck: 8.0.13 is a bugfix release, please see the following changelog. Furthermore, please consider that DRBD is completely useless in a production environment as of version 8.0.11. Every recent server has a battery backed write cache for it's disks and I don't see the point of forcing LTS users to take a slowdown of an order of magnitude (~3KB/s vs ~3MB/s) for applications writing in tiny increments (think INSERT statements of a database).

From: http://git.drbd.org/?p=drbd-8.0.git;a=blob_plain;f=ChangeLog;hb=HEAD
8.0.13 (api:86/proto:86)
--------
 * Fixed online resizing if there is application IO on the fly when the
   resize is triggered.
 * Fixed online resizing if it is triggered from the secondary node.
 * Fixed a possible deadlock in case "become-primary-on-both" is used, and
   a resync starts
 * Fixed the invocation of the pri-on-incon-degr handler
 * Fixed the exit codes of drbdsetup
 * sock_create_lite() to avoid a socket->sk leak
 * Auto-tune socket buffers if sndbuf-size is set to zero
 * Made it to compile on Linux-2.6.26

8.0.12 (api:86/proto:86)
--------
 * Corrected lock-out of application IO during bitmap IO.
   (Only triggered issues with multi-terrabyte volumes)
 * If an attach would causes a split-brain,
   abort the attach, do not drop the connection
 * A node without data (no disk, no connection) only accepts data
   (attach or connect) if that data matches the last-known data
 * Fixed various race conditions between state transitions
 * Various bugfixes to issues found by using the sparse tool
 * Corrected the exit codes of drbdsetup/drbdadm to match
   the expectations of dopd (drbd-outdate-peer-daemon)
 * Corrected the online changing of the number of AL extents while
   application IO is in flight.
 * Two new config options no-disk-flushes and no-md-flushes to disable
   the use of io subsystem flushes and barrier BIOs.
 * Make it compile on Linux-2.6.25
 * Support for standard disk stats
 * Work on stalling issues of the resync process
 * drbdsetup /dev/drbdX down no longer fails for non-existing minors
 * Added wipe-md to drbdadm

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

For the appropriate process to request a backport of a new version of a package to a previously released Ubuntu distribution, please refer to:
 * https://help.ubuntu.com/community/UbuntuBackports

:-Dustin

Revision history for this message
Mathias Gug (mathiaz) wrote :

Marking as fixed released in intrepid. Intrepid ships with 8.2.6.

Changed in drbd8:
status: Won't Fix → Fix Released
Revision history for this message
Mathias Gug (mathiaz) wrote :

At a first glance, 8.0.13 seems to be bug fix only and would fix this major performance problem. It may fit the criteria for a Stable Release Update.

The process to get an update released in hardy is outlined on the following page:

https://wiki.ubuntu.com/StableReleaseUpdates

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Note that my comment was not intended to state an opinion as to whether this belongs in the Stable Release Update, or the Backport Queue.

Rather, a previous poster asked for the relevant documentation on Backports.

For completeness, here's the relevant policy documentation on Stable Release Updates:
 * https://wiki.ubuntu.com/StableReleaseUpdates

:-Dustin

Revision history for this message
Florian Hackenberger (f-hackenberger) wrote :

Would someone be willing to undergo this process if I provide the 8.0.13 source package (for the drbd8-utils package)? The source in the kernel would have to be patched by someone else.

Citing from https://wiki.ubuntu.com/StableReleaseUpdates this upgrade would be justified:
Bugs will be considered if they "(1) have an obviously safe patch and (2) affect an application"

However, we cannot backport from Intrepid, as 8.2.6 is a version upgrade which involves a lot of new features and is not even considered stable.

Revision history for this message
Mathias Gug (mathiaz) wrote : Re: [Bug 288226] Re: DRBD 8.0.11 is unusably slow

Hi Florian,

On Fri, Oct 24, 2008 at 08:18:14AM -0000, Florian Hackenberger wrote:
> Would someone be willing to undergo this process if I provide the 8.0.13
> source package (for the drbd8-utils package)? The source in the kernel
> would have to be patched by someone else.

How big is the kernel patch? Could a patch against the current hardy kernel be prepared and attached to this bug so that the kernel team can have a look at it?

--
Mathias Gug
Ubuntu Developer http://www.ubuntu.com

Revision history for this message
Ante Karamatić (ivoks) wrote :

On Fri, 24 Oct 2008 08:47:03 -0000
Mathias Gug <email address hidden> wrote:

> How big is the kernel patch? Could a patch against the current hardy
> kernel be prepared and attached to this bug so that the kernel team
> can have a look at it?

I'll work on this one.

Revision history for this message
Ante Karamatić (ivoks) wrote :

Here is the patch for linux ubuntu modules git tree. Patch isn't that small - over 150k. If we are going to apply this, very intensive testing should be done before that. And, since userspace depends on exact version of kernel modules, kernel upgrade should be executed after the userspace is pushed to -proposed, with requirement for next kernel version.

I'm building necessary packages for testing. They should be available in 3-4 days.

Revision history for this message
Ante Karamatić (ivoks) wrote :

Could you install/test packages located at:

http://init.hr/dev/ubuntu-bugs/288226

Note that this is a newer kernel version than the one you are using in hardy.

Thanks

Revision history for this message
Ante Karamatić (ivoks) wrote :

Working git formated patch for hardy's lum git.

Ante Karamatić (ivoks)
Changed in drbd8:
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Stefan Soriga (sgstefan) wrote :

I have to upgrade an gutsy instalation, witch includes drbd, to hardy till' 18 of april. Does this bug have been resolved in 8.04.2? I really don't like "don't resolve bugs because we're in a already released distro" policy.

Chuck Short (zulcss)
affects: drbd8 (Ubuntu) → linux (Ubuntu)
Revision history for this message
Julian Wiedmann (jwiedmann) wrote :

This release has reached end-of-life [0].

[0] https://wiki.ubuntu.com/Releases

Changed in linux (Ubuntu Hardy):
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.