Windows guest hangs after reboot from the guest OS

Bug #2064914 reported by Björn Hinz
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu (Fedora)
Unknown
Unknown
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Sergio Durigan Junior

Bug Description

[ Impact ]

Some versions of Windows hang on reboot if their TSC value is greater
than 2^54. The calibration of the Hyper-V reference time overflows
and fails; as a result the processors' clock sources are out of sync.

[ Test Plan ]

As suggested by Mauricio, testing will be done in stages.

1) unit test, with such rdtsc/print loop (and confirm the tsc value decreases after system_reset).

This can be done by using x86/tsc.flat from the following repository:

https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git

Follow the steps below:

Inside a Jammy system (privileged container/VM, bare metal, etc.):

# apt update && apt install gcc make -y
# git clone https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git
# cd kvm-unit-tests
# wget https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2064914/+attachment/5784045/+files/tsc.c.patch -O- | patch -p1
# ./configure && make

Make sure x86/tsc exists. Now you can install qemu and perform the test:

# apt install -y qemu-system-x86
# qemu-system-x86_64 -serial file:/tmp/bogus-output -accel kvm -kernel x86/tsc.flat -monitor stdio -nographic

Wait a couple of seconds and issue a "system_reset" command. Then, wait a couple more seconds and issue a "quit" command.

You can now open /tmp/bogus-output and check the values of rdtsc. You will notice that its value increments after the "system_reset", which is exactly what we don't want.

Afterwards, you can update qemu and test the fix by doing the same steps (make sure you adjust the "file:/tmp/..." path).

2) regression test, booting Ubuntu kernel/initrd pairs (installer's should be enough) from supported releases, and checking they boot/reach a prompt.
2.1) now, it is important to _reboot_, and check they boot/reach a prompt too.

[ Where problems could occur ]

This is a change impacting normal x86 code, so although the patch is small and well contained, in the unlikely case that we find a regression it will impact more users. As such, and under Mauricio's advice, the test plan is being extended to really guarantee that the common virtualization scenarios are not impacted. If we find a problem with this update, there is the possibility of reverting it temporarily until we can devise a proper fix.

Regressions would be likely to occur in the initialization / (re)boot path,
which should be fine to identify early in testing, except for corner cases.

[ Original Description ]

Description:
Some versions of Windows hang on reboot if their TSC value is greater
than 2^54. The calibration of the Hyper-V reference time overflows
and fails; as a result the processors' clock sources are out of sync.

The issue is that the TSC _should_ be reset to 0 on CPU reset and
QEMU tries to do that. However, KVM special cases writing 0 to the
TSC and thinks that QEMU is trying to hot-plug a CPU, which is
correct the first time through but not later. Thwart this valiant
effort and reset the TSC to 1 instead, but only if the CPU has been
run once.

For this to work, env->tsc has to be moved to the part of CPUArchState
that is not zeroed at the beginning of x86_cpu_reset.

Solution: [PATCH] target/i386: properly reset TSC on reset

I created and tested a ppa ubuntu package already. The patch fixes this issue.
Link to ppa: https://launchpad.net/~bhinz83/+archive/ubuntu/openstack-rds/+packages

It affects only jammy 22.04 package. The newest version is: qemu-1:6.2+dfsg-2ubuntu6.19

Related branches

CVE References

Revision history for this message
Björn Hinz (bhinz83) wrote :
description: updated
description: updated
Björn Hinz (bhinz83)
description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Patch imported from RHEL 8" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Björn Hinz (bhinz83) wrote :
Paride Legovini (paride)
Changed in qemu (Ubuntu Jammy):
status: New → Triaged
Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Paride Legovini (paride) wrote :

Hello and thanks for this bug report, for attaching the patch and for the PPA.

I know you wrote that this bug "affects only jammy 22.04 package" but let me ask explicitly: is this fixed in the Ubuntu 24.04 LTS? This is important for us know for a process point of view.

Also: making the fix land to Jammy will require verification from an user affected by the bug. This is likely you, given that we are unlikely to have the required Windows version at hand. The process consists in installing the affected package from the -proposed pocket, and to verify it works as expected (more on this process: [1]). Are you willing to help us this verification?

Thanks!

[1] https://wiki.ubuntu.com/StableReleaseUpdates

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you Björn for reporting the bug, and Paride for the initial triage.

As Paride said, we have to make sure this is not affecting other versions of QEMU as well. If the patch mentioned in the description is indeed the only one needed to fix the bug, then I think we're good:

$ git tag --contains 5286c3662294119dc2dd1e9296757337211451f6
v7.0.0
v7.0.0-rc2
v7.0.0-rc3
v7.0.0-rc4
v7.1.0
v7.1.0-rc0
v7.1.0-rc1
v7.1.0-rc2
v7.1.0-rc3
v7.1.0-rc4
v7.2.0
v7.2.0-rc0
v7.2.0-rc1
v7.2.0-rc2
v7.2.0-rc3
v7.2.0-rc4
v7.2.1
v7.2.2
v7.2.3
v7.2.4
v7.2.5
v7.2.6
v7.2.7
v7.2.8
v7.2.9
v8.0.0
v8.0.0-rc0
v8.0.0-rc1
v8.0.0-rc2
v8.0.0-rc3
v8.0.0-rc4
v8.0.1
v8.0.2
v8.0.3
v8.0.4
v8.0.5
v8.1.0
v8.1.0-rc0
v8.1.0-rc1
v8.1.0-rc2
v8.1.0-rc3
v8.1.0-rc4
v8.1.1
v8.1.2
v8.1.3
v8.1.4
v8.1.5
v8.2.0
v8.2.0-rc0
v8.2.0-rc1
v8.2.0-rc2
v8.2.0-rc3
v8.2.0-rc4
v8.2.1

We have QEMU 8.0.4 on Mantic and 8.2.2 on Noble.

@Björn, while it would be good to have reproduction steps for the bug, we can also rely on your help to verify the correctness of the fix (as Paride explained).

I'll work on prepare an upload for Jammy meanwhile.

Thanks.

Changed in qemu (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
description: updated
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi again, Björn,

Could you please give the following PPA a try and tell me if the package works?

https://launchpad.net/~sergiodj/+archive/ubuntu/qemu

The QEMU version there is 1:6.2+dfsg-2ubuntu6.20~ppa1.

Thanks a lot.

Revision history for this message
Björn Hinz (bhinz83) wrote :

Hi,
thanks for your investigation.

@Sergio In rhel bug desciption https://bugzilla.redhat.com/show_bug.cgi?id=2074737 is described a test.
In qemu I found the patch https://github.com/qemu/qemu/commit/5286c3662294119dc2dd1e9296757337211451f6 too.
It seems that this issue patched since qemu 7.0.0

@Paride I saw this issue in our production environment with "Windows Server2012 R2", "Windows Server 2019" and "Windows Server 2022" with all available updates installed. We had this problem for the last years but only found this patch last month.

@Sergio I will try and test your ppa package.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Sergio,

Thanks for the upload!

The bug and fix sound good and relatively simple to verify.

The SRU bug template is still missing Test Plan / Regression Potential, so I'll mark it as Incomplete for the time being (I see it may be coming soon, per "TBD").

While looking at the Björn's mention of the test case in RH BZ, that uses the `rdtsc.flat` kernel, which I couldn't find elsewhere. Do you know about it?
It's apparently a `rdtsc`/print loop we can replicate, if needed.

If I may, I'd suggest to test this in 3 ways:

1) unit test, with such rdtsc/print loop (and confirm the tsc value decreases after system_reset)
2) functional test, booting Windows (e.g., downloaded from MSFT Evaluation Center) and changing TSC manually to a problematic value (> 2^54) before reset, with the QEMU monitor or GDB, if possible?
3) regression test, booting Ubuntu kernel/initrd pairs (installer's should be enough) from supported releases, and checking they boot/reach a prompt.

I realize that looks like too much for a simple fix, but this is QEMU on amd64.
I'd be quite willing to help with that if needed. :)

Thanks again!

Changed in qemu (Ubuntu Jammy):
status: Triaged → Incomplete
Revision history for this message
Björn Hinz (bhinz83) wrote :

@Sergio your package 1:6.2+dfsg-2ubuntu6.20~ppa1 works

@Mauricio as you can see the qemu version 6.2 is used in RHEL 8.5 too. They integrated this patch two years ago. The QEMU project has this patch since version 7.0. I think it is well tested.
The business impact of this bug with Windows VM's on OpenStack is huge. For me it is good to use my own package but for all other customers of ubuntu this isn't a nice way. In my opinion it is not good to wait for another month.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Mauricio,

Apologies for taking long to reply. As you know, I've been busy with other stuff.

Thank you very much for your considerations. It seems that I jumped the gun on this SRU and uploaded without finishing the SRU text; sorry. Also, I liked your suggestions for more testing. I was able to perform the first one using tsc.flat from this repo: https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git. I'll post my results later.

I'm not entirely sure I'll be able to perform test (2) from your list, but I will try.

As for test (3), I can also work on it.

I'll update the SRU text accordingly. Thanks.

description: updated
Changed in qemu (Ubuntu Jammy):
status: Incomplete → In Progress
Changed in qemu (Ubuntu):
status: Incomplete → Invalid
status: Invalid → Fix Released
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Sergio, thanks for the updates to the bug description/SRU template.

I've extended the Test Plan to cover not only boots, but reboots too,
of course, which the patch changes/exercises (missed this previously);
and Regression Potential, to point to the modified area (cpu reset).

Changed in qemu (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote : Please test proposed package

Hello Björn, or anyone else affected,

Accepted qemu into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:6.2+dfsg-2ubuntu6.20 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Björn,

> @Mauricio as you can see the qemu version 6.2 is used in RHEL 8.5 too.
> They integrated this patch two years ago. The QEMU project has this patch since version 7.0.
> I think it is well tested.

Thanks for the reassurance of the code _change_ testing in RHEL and upstream!

This is valid and helpful feedback, but please note the different code _base_
(Ubuntu) does not allow interpreting that as sufficient, alone; there has to
be testing of the code _change_ on the same code _base_.

The PPA testing you performed helped a lot with that, thanks again, and in
case you'd have cycles to test jammy-proposed on Windows, that'd be great!
And would help with this point too:

> The business impact of this bug with Windows VM's on OpenStack is huge.
> For me it is good to use my own package but for all other customers of ubuntu
> this isn't a nice way. In my opinion it is not good to wait for another month.

cheers,
Mauricio

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.20)

All autopkgtests for the newly accepted qemu (1:6.2+dfsg-2ubuntu6.20) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

cinder/unknown (armhf)
initramfs-tools/0.140ubuntu13.4 (armhf)
libvirt/unknown (s390x)
livecd-rootfs/unknown (s390x)
nova/unknown (armhf)
open-iscsi/2.1.5-1ubuntu1 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Performing the verification on Jammy.

1) Unit test verification

First, making sure that we can reproduce the problem.

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:6.2+dfsg-2ubuntu6.19
  Candidate: 1:6.2+dfsg-2ubuntu6.19
  Version table:
 *** 1:6.2+dfsg-2ubuntu6.19 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:6.2+dfsg-2ubuntu6.16 500
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     1:6.2+dfsg-2ubuntu6 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages

Booting from ROM..enabling apic^M
smp: waiting for 0 APs^M
rdtsc = 311924236^M
rdtsc = 312962416^M
rdtsc = 314029548^M
rdtsc = 315083088^M
rdtsc = 316153232^M

... system_reset issued here ...

Booting from ROM..enabling apic^M
smp: waiting for 0 APs^M
rdtsc = 14001762132^M
rdtsc = 14002862272^M
rdtsc = 14004006064^M
rdtsc = 14005093708^M
rdtsc = 14006248040^M

We can see that the value of rdtsc increased after the system_reset, which is the issue we want to fix.

Now, verifying that the new QEMU fixes the problem.

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:6.2+dfsg-2ubuntu6.20
  Candidate: 1:6.2+dfsg-2ubuntu6.20
  Version table:
 *** 1:6.2+dfsg-2ubuntu6.20 500
        500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:6.2+dfsg-2ubuntu6.19 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
     1:6.2+dfsg-2ubuntu6.16 500
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     1:6.2+dfsg-2ubuntu6 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages

Booting from ROM..enabling apic^M
smp: waiting for 0 APs^M
rdtsc = 322897852^M
rdtsc = 324612970^M
rdtsc = 326347882^M
rdtsc = 328061272^M
rdtsc = 329828278^M

... system_reset issued here ...

Booting from ROM..enabling apic^M
smp: waiting for 0 APs^M
rdtsc = 231401932^M
rdtsc = 232646796^M
rdtsc = 233860396^M
rdtsc = 235070448^M
rdtsc = 236306912^M

We can see that rdtsc is now lower than it was after system_reset is issued.

2) Boot verification

Tested that the new QEMU properly boots *and* reboots an Ubuntu cloud image.

This concludes the verification for Jammy.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:6.2+dfsg-2ubuntu6.21

---------------
qemu (1:6.2+dfsg-2ubuntu6.21) jammy-security; urgency=medium

  * SECURITY REGRESSION: 9pfs restrictions on sockets (LP: #2065579)
    - debian/patches/ubuntu/lp-2065579-9pfs-allow-sockets.patch: allow
      sockets and FIFOs to be opened in hw/9pfs/9p-util.h. The fix for
      CVE-2023-2861 was too restrictive for some use-cases.

 -- Marc Deslauriers <email address hidden> Wed, 05 Jun 2024 12:25:53 -0400

Changed in qemu (Ubuntu Jammy):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.