qemu 4.2-3ubuntu6.27 Does Not Boot Reliably Into Linux 5.4.0-164.181

Bug #2039258 reported by Martin Johansen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

We have a number of Ubuntu 20.04 servers that have recently been upgraded to linux 5.4.0-164.181 and qemu 4.2-3ubuntu6.27.

About 1/1000 boots of qemu results in a failed boot hanging at 100% CPU indefinately. A number of our utilities are using qemu for processing, so we get a lot of processes spinning at 100% that we need to clean up regularly.

Ubuntu: 20.04
Linux: 5.4.0-164.181
Qemu: 4.2-3ubuntu6.27

This problem was not seen before the release of Linux. 5.4.0-164.181 on sept. 1.

What you expected to happen: Qemu boots linux reliably.
What happened instead: Qemu hangs at 100% indefinately.
---
ProblemType: Bug
AlsaVersion: Advanced Linux Sound Architecture Driver Version k5.4.0-164-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.27
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd QEMU USB Tablet
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Lsusb-t:
 /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
     |__ Port 1: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
Package: linux (not installed)
ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-164-generic root=UUID=ed61e67e-7605-4383-ac16-fe54ee2ede87 ro net.ifnames=0 biosdevname=0 netcfg/do_not_use_netplan=true
ProcVersionSignature: Ubuntu 5.4.0-164.181-generic 5.4.248
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-164-generic N/A
 linux-backports-modules-5.4.0-164-generic N/A
 linux-firmware 1.187.39
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal
Uname: Linux 5.4.0-164-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.13.0-1ubuntu1.1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-4.2
dmi.modalias: dmi:bvnSeaBIOS:bvr1.13.0-1ubuntu1.1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-4.2:cvnQEMU:ct1:cvrpc-i440fx-4.2:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-4.2
dmi.sys.vendor: QEMU

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2039258

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin Johansen (martinfj) wrote : AlsaDevices.txt

apport information

tags: added: apport-collected focal
description: updated
Revision history for this message
Martin Johansen (martinfj) wrote : CRDA.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : Lspci.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : Lspci-vt.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : Lsusb-v.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : PciMultimedia.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : ProcEnviron.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : ProcModules.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : UdevDb.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : WifiSyslog.txt

apport information

Revision history for this message
Martin Johansen (martinfj) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Martin Johansen (martinfj) wrote :

We managed to reproduce the failure on a fresh install of Ubuntu 20.04 using this script. It occurred already on the 90th iteration.

-----------
for i in {1..10000}
do
    echo Run $i
    guestfish -vx -a /dev/null run >& log$i
done
-----------

Here is what we did after booting a fresh install

apt update
apt upgrade
apt install qemu-system-x86
apt install libguestfs-tools
(rebooted the server and ran the script above)

Revision history for this message
Martin Johansen (martinfj) wrote :

This is a successfull boot of qemu.

Revision history for this message
Martin Johansen (martinfj) wrote :

This is a failing boot of qemu that we killed after some time.

Revision history for this message
Roxana Nicolescu (roxanan) wrote :

Can you tell us from which version you upgraded to 164? It would be helpful to narrow down what changes happened in between.

Revision history for this message
Martin Johansen (martinfj) wrote :

We have live updates with unattended updates. Our first recording of this problem is Sept. 5th, so this points towards Linux 5.4.0-164.181 being the culprit unless you updated this package just a few days before. Linux 5.4.0-164.181 was installed on our systems Sept 1st, and this has been a problem ever since.

We do thousands of runs each day, so if this bug had occurred earlier, we would have noticed.

Revision history for this message
Roxana Nicolescu (roxanan) wrote :

On the 1st of September, we started preparing 164, it was released last week. Probably the version installed was 5.4.0-162.

Could you try this with the version from -proposed? It's 5.4.0-166.183.
I checked some discussions from here https://gitlab.com/qemu-project/qemu/-/issues/1696 and it seems it's similar. The fix
"tick/common: Align tick period during sched_timer setup" just landed in -proposed.

To install a kernel from -proposed, I include -proposed to the apt-repository list:
sudo add-apt-repository ppa:canonical-kernel-team/proposed

And then install the desired version.

You can also check this official page for more info on how to test from -proposed pocket.
https://wiki.ubuntu.com/Testing/EnableProposed

Revision history for this message
Martin Johansen (martinfj) wrote :

> Probably the version installed was 5.4.0-162.

Yes, you are probably right. I was only looking at the date of the kernel file. The bug is reproducible with Linux 5.4.0-164.181, however.

> I checked some discussions from here https://gitlab.com/qemu-project/qemu/-/issues/1696 and it seems it's similar.

I have discusses the problem with Richard Jones here:

https://bugzilla.redhat.com/show_bug.cgi?id=2241293

He claims the issue in this bug report is not the same bug as you refer to. Se his last comment.

Revision history for this message
Roxana Nicolescu (roxanan) wrote :

Can you test the newest from -proposed (166) to confirm it is not the same bug? It's not TCG indeed but for now, I don't have other ideas. Will continue looking into it.

Revision history for this message
Martin Johansen (martinfj) wrote :

We have now done close to 6000 runs over the last 24 hours, and 166 from -proposed does not seem to have this problem!

Does this mean this is the same bug?

When can we expect this to be rolled out? It will be a real relief for us when this bug does not affect our production servers any more.

Revision history for this message
Martin Johansen (martinfj) wrote :

After having run tests for two days, it seems likely to us that the bug is currently not in 166 from -proposed.

Revision history for this message
Roxana Nicolescu (roxanan) wrote :

Yes, I think it's the same issue, I am now running the test example you provided on a image that that only contains the fix "tick/common: Align tick period during sched_timer setup" on top of what's in 164 to "double" confirm.

The version from -proposed will land in -updates in 2 weeks. Unfortunately, we cannot release it sooner.

Revision history for this message
Martin Johansen (martinfj) wrote :

Any news on this issue? Is it on time to launch the fix at about 30th of october?

Revision history for this message
Martin Johansen (martinfj) wrote :

We see that that vmlinuz-5.4.0-166-generic was deplyed as an automatic update a few hours ago, hopefully this resolves the problem :)

Revision history for this message
Martin Johansen (martinfj) wrote :

We can confirm that this bug is no longer an issue with 5.4.0-166 being deployed on all our servers. We consider this bug solved.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.