Instances don't start correctly on 32bit systems with large disk files

Bug #628055 reported by Thierry Carrez
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Invalid
High
Dave Walker
Maverick
Invalid
High
Dave Walker
libvirt (Ubuntu)
Fix Released
High
Jamie Strandboge
Maverick
Fix Released
High
Jamie Strandboge

Bug Description

20100831/maverick/i386 beta candidate, Topology 1 (CLC+CC+SC+Walrus on same machine, NC on another):

Trying to run an instance, the instance starts up but never goes from "pending" to "running".
nc.log shows:

[EUCAERROR ] libvirt: internal error process exited while connecting to monitor: libvir: Security Labeling error : internal error error calling aa_change_profile()
 (code=1)

Not sure if it is related, but the i386 CLC+CC+SC+Walrus is utterly slow. A "eucalyptus-cloud" process seems to overload the system.

Revision history for this message
Thierry Carrez (ttx) wrote :

Also occurs with lucid standard images.

Changed in eucalyptus (Ubuntu):
importance: Undecided → Critical
tags: added: iso-testing
Revision history for this message
Thierry Carrez (ttx) wrote :

Reinstalled on amd64 and I couldn't reproduce that one.

Changed in eucalyptus (Ubuntu):
importance: Critical → High
Revision history for this message
Thierry Carrez (ttx) wrote :

Retried on i386, same error. However instances went to "running" but got teared down at the nc level. I wonder how much of those issues are linked to the slowness on the CLC.

Thierry Carrez (ttx)
summary: - Instances don't go to "running" state: Security Labeling error running
+ Instances don't start correctly: Security Labeling error running
aa_change_profile()
Revision history for this message
Dave Walker (davewalker) wrote : Re: Instances don't start correctly: Security Labeling error running aa_change_profile()

@Thierry: When you were able to reproduce this, was it encountered shortly after boot? I noticed eucalyptus-cloud being a resource hog, that tends to settle down. This could indicate why it's inconsistent?

Thierry Carrez (ttx)
tags: added: server-mrs
Changed in eucalyptus (Ubuntu Maverick):
assignee: nobody → Dave Walker (davewalker)
Revision history for this message
Thierry Carrez (ttx) wrote :

That was not "shortly", but like 15min after boot.

Revision history for this message
Andrea Corbellini (andrea.corbellini) wrote :

I see this bug too, but I don't think the problem is in eucalyptus: I'm using qemu+kvm+libvirt.

Whenever I try to start any of my virtual machines (that just some days ago in Lucid worked fine) I get this error:

# virsh -c qemu:///system start abc
error: Failed to start domain abc
error: internal error Process exited while reading console log output: libvir: Security Labeling error : internal error error calling aa_change_profile()

In my opinion, this is a problem in a apparmor profile:

# /etc/init.d/apparmor stop
# restart libvirt-bin
# virsh -c qemu:///system start abc
Domain abc started

Revision history for this message
Thierry Carrez (ttx) wrote :

Tentatively assigned to Jamie, so that he can check if this is a recent profile confinement issue...

Changed in libvirt (Ubuntu Maverick):
importance: Undecided → High
assignee: nobody → Jamie Strandboge (jdstrand)
status: New → Confirmed
Revision history for this message
Thierry Carrez (ttx) wrote :

Probably not eucalyptus-specific.

Changed in eucalyptus (Ubuntu Maverick):
status: New → Invalid
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

This error is a generic error that is unfortunately not informative and not necessarily indicative of an AppArmor problem. Can you post the output of:
$ virsh capabilities
$ grep DENIED /var/log/kern.log

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Also, the restart of libvirt could have been enough to 'fix' the issue, and not be a problem with AppArmor. That said, I need the kern.log output to determine if it is AppArmor or not.

Revision history for this message
Andrea Corbellini (andrea.corbellini) wrote :

Hi Jamie and thank you for your feedback. Here are the answers to your questions:

1. my capabilities are in the attachment;
2. there are no lines containing DENIED in my logs;
3. restarting libvirt without disabling AppArmor produces no effects.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Andrea, the virsh capabilities output does not list apparmor as a capability. Are you saying that it doesn't work with apparmor disabled as well?

Changed in libvirt (Ubuntu Maverick):
status: Confirmed → Incomplete
status: Incomplete → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Andrea Corbellini (andrea.corbellini) wrote :

Oops, sorry! I ran "virsh capabilities" after disabling apparmor.

So, to clarify: with AppArmor, every attempt to use a virtual machine fails; without it, everything works perfectly.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I can't reproduce this with libvirt (I don't have eucalyptus available for testing). That said, looking at the capabilities, I see in the guest:

      <domain type='qemu'>
      </domain>
      <domain type='kvm'>
        <emulator>/usr/bin/kvm</emulator>
        <machine>pc-0.12</machine>
        <machine canonical='pc-0.12'>pc</machine>
        <machine>pc-0.11</machine>
        <machine>pc-0.10</machine>
        <machine>isapc</machine>
      </domain>

And this on the host:
      <arch>i686</arch>
      <model>n270</model>

For one thing, the guest xml looks suspect since there are 2, <domain type=...>s, and also I didn't think the n270 i686 could do kvm. I am not sure what generated this xml, but I would look there first. Can you attach the domain xml for the affected guest?

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Actually, the two domains in the virsh capabilties seems to be normal (I see it here). Please still post the domain xml for the affected guest however.

Revision history for this message
Thierry Carrez (ttx) wrote :

I can reproduce it again on a daily UEC/i386:
- grep DENIED /var/log/kern.log doesn't yield anything
- See attached "virsh capabilities" output on the NC

Revision history for this message
Thierry Carrez (ttx) wrote :

@Andrea, are you running on i386 or amd64 ?

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Thierry, what type of CPU did you use to reproduce this? Can you post the domain xml for the affected guest?

Revision history for this message
Thierry Carrez (ttx) wrote :

It's an "Intel(R) Core(TM)2 Duo CPU T6670", running an i386 server install, trying to run an i386 instance.

I don't have a domain.xml, euca must be destroying it on error, or it never gets created. Same machine running an amd64 server install, trying to run an i386 instance, works perfectly. That's why I'm even wondering if libvirt/i386 is working, at this point.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Can you disable the AppArmor driver and see if it works there? This can be done by adjusting /etc/libvirt/qemu.conf to have:
security_driver = "none"

Then restart libvirt-bin with:
$ sudo stop libvirt-bin
$ sudo start libvirt-bin

Verify the AppArmor driver is disabled with (don't worry if it takes a few seconds to finish):
$ virsh capabilities | grep -A2 -B1 apparmor

When the driver is enabled, you will see:
    <secmodel>
      <model>apparmor</model>
      <doi>0</doi>
    </secmodel>

otherwise, you won't.

Revision history for this message
Thierry Carrez (ttx) wrote :

Disabled apparmor, I don't get the error message anymore. But I fall on another bug (VM doesn't seem to boot correctly) afterwards, so difficult to say "everything works just fine". I'll try to test livirt/qemu/kvm outside eucalyptus to debug this.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

That is what I thought might be the case. If the VM dies a horrible death then apparmor has no process to attach to when changing profile. I am going to unassign this from me until is is determined the AppArmor security driver is the problem.

Changed in libvirt (Ubuntu Maverick):
assignee: Jamie Strandboge (jdstrand) → nobody
Revision history for this message
Andrea Corbellini (andrea.corbellini) wrote :

Hi. Sorry but currently I'm not able to provide my domain.xml (however will do as soon as I find the time). But I would like to say that I'm not using eucalyptus and all my VMs work fine without apparmor.

Revision history for this message
C de-Avillez (hggdh2) wrote :

On a i386 test upgrade from 10.04 to Maverick I also got this error. All instances fail to start. Full logs at lp:~hggdh2/+junk/uec-qa, revision 61.

Revision history for this message
C de-Avillez (hggdh2) wrote :

Oh, forgot: this is a i386 install of Maverick.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Ok, I know what is failing, but I don't know why.

On amd64 things are ok:
$ cat /proc/cpuinfo | grep model
model : 37
model name : Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz
... repeated 4 times ...
$ dpkg --print-architecture
amd64
$ uname -m
x86_64
$ cat ./628055.xml | /usr/lib/libvirt/virt-aa-helper -r -u libvirt-29c89df7-94d6-f247-c2ed-b63dd4c0f7a6 --dryrun
virt-aa-helper: warning: path does not exist, skipping file type checks
09:33:27.403: warning : virDomainDiskDefForeachPath:7672 : Ignoring open failure on /home/ubuntu/tmp/disk: No such file or directory
virt-aa-helper:
/etc/apparmor.d/libvirt/libvirt-29c89df7-94d6-f247-c2ed-b63dd4c0f7a6.files
virt-aa-helper:
  "/var/log/libvirt/**/testme.log" w,
  "/var/lib/libvirt/**/testme.monitor" rw,
  "/var/run/libvirt/**/testme.pid" rwk,
  "/home/ubuntu/tmp/disk" rw,

On an kvm i386 install on a 64bit host:
$ cat /proc/cpuinfo | grep model
model : 2
model name : QEMU Virtual CPU version 0.12.5
$ dpkg --print-architecture
i386
$ uname -m
i686
$ cat ./628055.xml | /usr/lib/libvirt/virt-aa-helper -r -u libvirt-29c89df7-94d6-f247-c2ed-b63dd4c0f7a6 --dryrun
virt-aa-helper: warning: path does not exist, skipping file type checks
14:34:23.193: warning : virDomainDiskDefForeachPath:7672 : Ignoring open failure on /home/ubuntu/tmp/disk: No such file or directory
virt-aa-helper:
/etc/apparmor.d/libvirt/libvirt-29c89df7-94d6-f247-c2ed-b63dd4c0f7a6.files
virt-aa-helper:
  "/var/log/libvirt/**/testme.log" w,
  "/var/lib/libvirt/**/testme.monitor" rw,
  "/var/run/libvirt/**/testme.pid" rwk,
  "/home/ubuntu/tmp/disk" rw,

On an i386 install on a 32-bit host:
$ cat /proc/cpuinfo | grep model
model : 23
model name : Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz
model : 23
model name : Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz
$ dpkg --print-architecture
i386
$ uname -m
i686
$ cat ./628055.xml | /usr/lib/libvirt/virt-aa-helper -r -u libvirt-29c89df7-94d6-f247-c2ed-b63dd4c0f7a6 --dryrun
virt-aa-helper: error: invalid VM definition

Changed in libvirt (Ubuntu Maverick):
assignee: nobody → Jamie Strandboge (jdstrand)
milestone: none → ubuntu-10.10
status: Incomplete → Confirmed
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Ok, I am able to reproduce this on a pure i386 machine using the disk as setup by eucalyptus. My tests before were not identical. Here are the results:

== amd64 ==
$ ./628055.sh
$ cat /proc/cpuinfo | grep model
model : 37
model name : Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz
model : 37
model name : Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz
model : 37
model name : Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz
model : 37
model name : Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz
$ dpkg --print-architecture
amd64
$ uname -m
x86_64
$ md5sum /home/jamie/tmp/628055/disk
0ded1356f29b0df69eada9114f312ca9 /home/jamie/tmp/628055/disk
$ cat /tmp/tmp.MBQ5hd2NFG/xml | /usr/lib/libvirt/virt-aa-helper -r -u libvirt-29c89df7-94d6-f247-c2ed-b63dd4c0f7a6 --dryrun
virt-aa-helper:
/etc/apparmor.d/libvirt/libvirt-29c89df7-94d6-f247-c2ed-b63dd4c0f7a6.files
virt-aa-helper:
  "/var/log/libvirt/**/testme.log" w,
  "/var/lib/libvirt/**/testme.monitor" rw,
  "/var/run/libvirt/**/testme.pid" rwk,
  "/home/jamie/tmp/628055/disk" rw,

== i386/dual core ==
$ sh ./628055.sh
$ cat /proc/cpuinfo | grep model
model : 23
model name : Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz
model : 23
model name : Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz
$ dpkg --print-architecture
i386
$ uname -m
i686
$ md5sum /home/ubuntu/tmp/628055/disk
0ded1356f29b0df69eada9114f312ca9 /home/ubuntu/tmp/628055/disk
$ cat /tmp/tmp.yR7X1rkLpv/xml | /usr/lib/libvirt/virt-aa-helper -r -u libvirt-29
n
virt-aa-helper: error: invalid VM definition

== i386/single core ==
$ ./628055.sh
$ cat /proc/cpuinfo | grep model
model : 13
model name : Intel(R) Pentium(R) M processor 1.70GHz
$ dpkg --print-architecture
i386
$ uname -m
i686
$ md5sum /home/jamie/tmp/628055/disk
0ded1356f29b0df69eada9114f312ca9 /home/jamie/tmp/628055/disk
$ cat /tmp/tmp.QW8EtZsUTi/xml | /usr/lib/libvirt/virt-aa-helper -r -u libvirt-29c89df7-94d6-f247-c2ed-b63dd4c0f7a6 --dryrun
virt-aa-helper: error: invalid VM definition

Now that I have the necessary files and have determined that it is not machine specific, but seems to be architecture specific, I can hopefully narrow this down more.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Here is the problem: stat() is returning with error in virFileExists() on 32 bit when we use it on the disk image. Ie:

$ ls -l /home/jamie/tmp/628055/disk
-rw-r--r-- 1 jamie jamie 2179989504 2010-09-22 08:57 /home/jamie/tmp/628055/disk

== amd64 ==
$ a.out /home/jamie/tmp/628055/disk
stat(/home/jamie/tmp/628055/disk) returned: 0
virFileExists('/home/jamie/tmp/628055/disk') returned: 1

== i386 ==
$ a.out /home/jamie/tmp/628055/disk
stat(/home/jamie/tmp/628055/disk) returned: -1
stat failed: Value too large for defined data type
virFileExists('/home/jamie/tmp/628055/disk') returned: 0

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Turns out this was a missing #include in virt-aa-helper.c. In previous releases we did not need '#include <sys/stat.h>', but in libvirt 0.8.3 (presumably because of its embedded gnulib, which has functions for stat()) and the toolchain in maverick, we do.

Specifically, virt-aa-helper.c has in its buildlog:
security/virt-aa-helper.c: In function 'valid_path':
security/virt-aa-helper.c:541: warning: implicit declaration of function 'stat' [-Wimplicit-function-declaration]
security/virt-aa-helper.c:541: warning: nested extern declaration of 'stat' [-Wnested-externs]

Adding '#include <sys/stat.h>' to virt-aa-helper.c fixes both the warning and the problem. I will be uploading a new libvirt with this fix shortly.

Changed in libvirt (Ubuntu Maverick):
status: Confirmed → In Progress
summary: - Instances don't start correctly: Security Labeling error running
- aa_change_profile()
+ Instances don't start correctly on 32bit systems with large disk files
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

The attached debdiff is what I uploaded to fix this issue. It is confirmed to fix the issue using the reproducer locally.

Changed in libvirt (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.8.3-1ubuntu13

---------------
libvirt (0.8.3-1ubuntu13) maverick; urgency=low

  * debian/patch/9028-lp628055.patch: include sys/stat.h to fix compiler
    warning and stat() failure on 32bit architectures when calling stat() on
    large files. This can be dropped in 0.8.5. (LP: #628055)
 -- Jamie Strandboge <email address hidden> Wed, 22 Sep 2010 15:21:21 -0500

Changed in libvirt (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
wbm (wbmills) wrote :

I just converted a set of VM's from a Windows VMware install to KVM and ran into the same error. Some machines work and some do not (same error as above instances). Interestingly,I first tried Ubuntu server 10.04 and all machines ran just fine. I later decided to use Ubuntu workstation 10.10 and ran into the error.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.