LUKS is extremely slow on amd64 builds but not on i386

Bug #731340 reported by tdn
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

TL;DR: Disk I/O with LUKS is extremely slow on amd64 builds, however, it is much faster on i386 builds. This has been verified by testing on multiple hardware platforms and with multiple versions of Ubuntu. The possible cause of the slowness has been shown to somehow be related to the architecture. --- seems to actually be 10.10, since it's slow in i386, too, just not quite -as- slow; see comment #5.

Longer version:
My friend and I have the same Lenovo T61p laptops. We have the same hardware specs:
 - 2.5 GHz Core 2 Duo
 - 4 GB RAM
 - 80 GB Intel SSD

We even use the same OS, file systems, etc.:
 - Kubuntu
 - ext4
 - Whole disk encryption with LUKS
 - Disk set up with LVM

We have one difference though: for a long time, I have been using the 32 bit version of Kubuntu, my friend has always been using the 64 bit version.

Since we bought our laptops and still used hard discs instead of SSDs, my friends laptop has always been noticable slower with I/O.
In the beginning we just thought that he had gotten a bad disk. We both soon bought a new 80 GB Intel SSD in order to obtain better I/O performance.
When we installed on the SSD we both saw a huge increase in I/O performance. However, my friends system was still much slower than mine.
We thought that it might be due to alignment issues, so my friend researched this subject a lot and made sure that it was perfectly aligned.
Still, his performance with much poorer than mine.

We have really tried debugging this in countless ways. I will not even try to describe all of it here.

We both work on some of the same software projects. Thus, we have the same SVN checkouts on our file systems.
We took one of these checkouts (one with lots of files in it, about 13k files) and used it as reference.
We then measured how long it took to do a find |wc -l inside the folder. Here are the results:
On my friends system it took about 90 seconds. On my system it took less than 15 seconds.
We have repeated this test on several different Ubuntu installations from circa 8.04 up until the most recent 10.04 and we always got the same results.

Then a few months ago I upgraded to Ubuntu 10.10 and my friend did too. Now, suddenly, I began experiencing the same slowness.
Now, my computer also takes about 90 seconds to do the find in the same folder.
This was strange. We feared that maybe our SSDs were failing. We have heard stories about SSDs becoming slow before failing.

During a discussion about possible causes for this behavior it struck me: the only difference between our systems was the architechture. My friend always use amd64, and I use i386.
Recently, when I installed Kubuntu 10.10, it chose the 64 bit version, because I no longer had any legacy apps requiring 32 bit.
It was after this install that my system suddenly began performing badly.

So we have results like this:
Time C: Time taken by the find command on my friend's system
Time T: Time taken by the find command on my system
Arch C: My friend's architecture
Arch T: My architecture

Ubuntu version | Time C | Time T | Arch C | Arch T
---------------+----------+--------+---------+---------
8.04 | 90 s | <15 s | amd64 | i386
8.10 | 90 s | <15 s | amd64 | i386
9.04 | 90 s | <15 s | amd64 | i386
9.10 | 90 s | <15 s | amd64 | i386
10.04 | 90 s | <15 s | amd64 | i386
10.10 | 90 s | 90 s | amd64 | amd64

(I hope this gets formatted correctly. If not, there is a correctly formatted version here: http://paste.adora.dk/P1977.html)

Of course, we made sure to drop cache before testing with the command:
sync ; echo 3 > /proc/sys/vm/drop_caches

So the full command we ran was:
sync ; echo 3 > /proc/sys/vm/drop_caches ; time find $PATH_TO_WC |wc -l

We also did each test multiple times to make sure we got consistent results.

(I am trying to provide as much data as possible so as to make it easier to find and fix the bug)

In order to try to disprove our hypothesis about the architecture making the difference, I tried booting up on a 32 bit Knoppix Live CD on my laptop. I then mounted my LUKS encrypted rootfs from the Knoppix CD and did a find inside the same folder. Now it took about between 10 and 15 seconds to do the find.

OK, so we have found some correlation between the architechture of Ubuntu and the slow I/O.
There were still several possible places that could be responsible for the problem:
 * LVM
 * LUKS
 * Ext4

I then wanted to see which one of these that made a difference.
My /boot partition is not encrypted with LUKS, and I had enough free space available to copy the reference folder to /boot and then did the test there.
On the /boot partition, it took less than 2 seconds. I tried this both in the installed Kubuntu 10.10 (amd64) and from the 32 bit Knoppix live cd. Both systems took less than 2 seconds.

Here is the data from the Knoppix tests from encrypted / and non-encrypted /boot:
http://p.adora.dk/P1978.txt

It is clear that non-encrypted /boot is much quicker than encrypted /. However, all of the tests from the encrypted / partition takes less than 15 sec. This same test takes about 90 sec in amd64 version of Ubuntu.

The results from 64 bit Ubuntu 10.10 are available here:
http://paste.adora.dk/P1979.txt

/boot is ext4, so we can eliminate ext4 as the cause of the problem.

In order to test if LVM is the cause, I created a 100mb file on /boot. I made this file into a loop back device and set up an LVM volume on it.
I then created a file system inside it and mounted. Then I copied the reference folder to this file system and did the tests again.
All tests from this LVM volume took less than 2 seconds. So LVM is eliminated as the cause.

I see the combination of LUKS/amd64 as the only possible cause left.

I investigated further to check if it was a problem related to LUKS only or if it was also present on ecryptfs. For this test, I copied the reference folder to my ecryptfs encrypted homedir on my netbook (Lenovo S10 with Atom CPU, 2 GB RAM, and a 160 GB hard drive -- no SSD). The tests on the netbook took about 6-7 seconds.
This was actually pretty fast for a hard drive. So fast that I guess that it is safe to conclude that the problem does not occur on ecryptfs.

I have now layed out all the data and methology of the investigation. I really hope someone can take the next step and isolate the problem so it can be fixed.
As it is, I/O performance on 64 bit Ubuntu with LUKS is almost unusably slow.

My guess is that either the 64 bit or the 32 bit kernel has some kind of compile flag or other setting that makes the difference.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-25-generic 2.6.35-25.44
Regression: No
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.35-25.44-generic 2.6.35.10
Uname: Linux 2.6.35-25-generic x86_64
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
   Subdevices: 2/2
   Subdevice #0: subdevice #0
   Subdevice #1: subdevice #1
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: tdn 2187 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfe220000 irq 49'
   Mixer name : 'Analog Devices AD1984'
   Components : 'HDA:11d41984,17aa20bb,00100400'
   Controls : 31
   Simple ctrls : 19
Card29.Amixer.info:
 Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 7KHT24WW-1.08'
   Mixer name : 'ThinkPad EC 7KHT24WW-1.08'
   Components : ''
   Controls : 1
   Simple ctrls : 1
Card29.Amixer.values:
 Simple mixer control 'Console',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Tue Mar 8 14:10:10 2011
HibernationDevice: RESUME=UUID=bd03b060-e813-403f-b6c0-dbd58049fbfa
InstallationMedia: Kubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
MachineType: LENOVO 6460D8G
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.35-25-generic root=/dev/mapper/hostname-root ro quiet splash
ProcEnviron:
 LANGUAGE=
 PATH=(custom, user)
 LANG=en_DK.UTF-8
 SHELL=/bin/zsh
RelatedPackageVersions: linux-firmware 1.38.4
SourcePackage: linux
WifiSyslog:

dmi.bios.date: 11/14/2008
dmi.bios.vendor: LENOVO
dmi.bios.version: 7LETC5WW (2.25 )
dmi.board.name: 6460D8G
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7LETC5WW(2.25):bd11/14/2008:svnLENOVO:pn6460D8G:pvrThinkPadT61p:rvnLENOVO:rn6460D8G:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 6460D8G
dmi.product.version: ThinkPad T61p
dmi.sys.vendor: LENOVO

Revision history for this message
tdn (spam-thomasdamgaard) wrote :
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi tdn,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
tdn (spam-thomasdamgaard) wrote :

I will do that.

Which one(s) of the .debs do I need to download and install?
Please note that I use the proprietary nvidia driver. So I would really like this to keep working.

Revision history for this message
tdn (spam-thomasdamgaard) wrote :

I have just tested with a mainline kernel.

Result: much better performance. Results on Ubuntu 10.10 with the new kernel was between 3 and 3.6 seconds.

These are the steps I took in order to install the new kernel:

# download
wget "http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.38-natty/linux-image-2.6.38-020638-generic_2.6.38-020638.201103151303_amd64.deb"
wget "http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.38-natty/linux-headers-2.6.38-020638-generic_2.6.38-020638.201103151303_amd64.deb"
wget "http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.38-natty/linux-headers-2.6.38-020638_2.6.38-020638.201103151303_all.deb"
# install
dpkg -i linux-headers-2.6.38-020638_2.6.38-020638.201103151303_all.deb linux-headers-2.6.38-020638-generic_2.6.38-020638.201103151303_amd64.deb linux-image-2.6.38-020638-generic_2.6.38-020638.201103151303_amd64.deb
# fix
$EDITOR /usr/src/linux-headers-2.6.38-020638/Makefile
# set EXTRAVERSION= to: "-020638-generic"

Revision history for this message
Grondr (grondr) wrote :
Download full text (3.5 KiB)

It's not just AMD64, or this is two different bugs---I think it's
actually 10.10 in both AMD64 and i386.

I just installed 10.10 on a WD 500GB IDE disk inside a -non- encrypted
LVM using the alternate installer last week. I spent some time last
night benchmarking I/O using (a) a 2TB SATA 4K-sector Samsung with
an ext4 on it, and (b) the same model disk with aes-xts-plain64 (with
512-bit keys) under an ext4 (no LVM on either of those two disks).
ext4 created with defaults except "-i 1048576" so it would initialize
much faster---I wasn't planning on filling 2TB to run a test.

Sequential I/O to both 2TB disks was very fast---dd'ing /dev/zero
either directly to the partition or to LUKS ran around 130-140MB/s,
which is probably the limit for the disk. (This was before I created
filesystems on those partitions.) I then did "cp -a /usr /mnt/a1"
and similar for the LUKS-based FS and timed things.

"time find /usr | wc -l" (-before- the copy) took about 16 seconds,
and of course repeating this immediately took about 200ms. Since
this was /usr for the running OS, I'll bet that lots of it was already
in the caches. (Also, that IDE disk may seek faster than a 2TB green
drive.)

After the copy, I unmounted & remounted each of the SATA filesystems
to drop their caches, and timed. The non-LUKS one took 47 seconds.
The LUKS one took 3 minutes and 38 seconds! This was repeatable; I
could dismount & remount and always get the same figures. Rerunning
w/o a dismount/remount of course gave me subsecond times. This is
on a six-core AMD64 2.8Ghz CPU and none of the cores had appreciable
runtime; the CPU stayed at 800MHz the whole time, and user+sys for
the finds were about 3 seconds for LUKS and half a second without.

Note that this is only a 4.6 to 1 ratio and not the 30;1 in the OP's
report, so either this is a different bug or something else is going
on.

I wondered if ext4 atime vs 4K-sectors was somehow screwing me,
so I tried remounting both SATA filesystems noatime. No change
in results. (I didn't try mounting them ro; I could, but...)

I then booted 10.10 i386 from a LiveCD---this is the same release,
but a few kernels back, since the installed system is up-to-date
(2.6.35-28-generic) and the LiveCD is I think at 2.6.35-22.

THINGS DID NOT IMPROVE MUCH, THOUGH THEY DID IMPROVE---i386 took
2 minutes and 15 seconds, so it's 1m15s faster, or only 2.9 times
as bad instead of 4.6.

I then booted a Natty LiveCD from the March 30th build.

BIG IMPROVEMENT---time was down to 57 seconds, which is only 10
seconds slower than the non-LUKS case. I can live with that. (Though
I'm still curious why it's slower at all---LUKS in any crypto mode
I've tried [aes-cbc-essiv:sha256, or aes-xts-plain64 with either 256-
or 512-bit keys] runs at least 140 MB/s to my disks when dd'ing a few
gig, and I think the disks are the limiting factor, because they don't
run any faster just dd'ing plaintext to a raw partition).

I am -quite- reluctant to run Natty on an otherwise 10.10 system.
I'm -very very- relunctant to run Natty overall on this machine (I
discovered that high network load copying to normal plaintext disk
seemed to burn 10x the CPU it should (different story...

Read more...

Grondr (grondr)
description: updated
description: updated
Revision history for this message
Grondr (grondr) wrote :

I just installed Natty and brought it up-to-date as of about 4/8; this is using kernel 2.6.38-8-generic.

Two identical 2TB 4K-sector disks show about a 2x slowdown in LUKS using XTS and 512-bit keys: doing "time find usr -ls | wc -l" reported about 37 seconds on the ext4 disk, and 1m4s on the ext4-over-LUKS disk. This isn't great, though it's certainly better than the 6x and 30x slowdowns mentioned by me above or the OP at the start. Yes, of course I started from identical contents on each disk and by dismounting and then remounting each disk.

Still hoping someone can address that last factor of 2... the machine itself can run crypto -far- faster than the disks in long sequential reads or writes.

Revision history for this message
tdn (spam-thomasdamgaard) wrote :

Marking this comfirmed since Grondr also experience this bug. It was marked incomplete because it was not yet tested on mainline kernel. However, it has now been tested on mainline as described above.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Patrick Domack (patrickdk) wrote :

I thought it was just my cpu, then my drives. Till I upgraded both, and still notice this on 3 systems. All running 10.04 amd64 though.

My drives just go from doing 80MB/s to 2MB/s using luks aes-cbc-essiv:sha256
My other systems are using aes-xts-plain:sha512

Revision history for this message
penalvch (penalvch) wrote :

tdn, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a terminal, as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux REPLACE-WITH-BUG-NUMBER

If reproducible, could you also please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-3.15-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: bios-outdated-2.30
tags: added: lucid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.