Windows guest hangs after reboot from the guest OS

Bug #2064914 reported by Björn Hinz
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu (Fedora)
Unknown
Unknown
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
In Progress
Undecided
Sergio Durigan Junior

Bug Description

[ Impact ]

Some versions of Windows hang on reboot if their TSC value is greater
than 2^54. The calibration of the Hyper-V reference time overflows
and fails; as a result the processors' clock sources are out of sync.

[ Test Plan ]

As suggested by Mauricio, testing will be done in stages.

1) unit test, with such rdtsc/print loop (and confirm the tsc value decreases after system_reset).

This can be done by using x86/tsc.flat from the following repository:

https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git

Follow the steps below:

Inside a Jammy system (privileged container/VM, bare metal, etc.):

# apt update && apt install gcc make -y
# git clone https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git
# cd kvm-unit-tests
# wget https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2064914/+attachment/5784045/+files/tsc.c.patch -O- | patch -p1
# ./configure && make

Make sure x86/tsc exists. Now you can install qemu and perform the test:

# apt install -y qemu-system-x86
# qemu-system-x86_64 -serial file:/tmp/bogus-output -accel kvm -kernel x86/tsc.flat -monitor stdio -nographic

Wait a couple of seconds and issue a "system_reset" command. Then, wait a couple more seconds and issue a "quit" command.

You can now open /tmp/bogus-output and check the values of rdtsc. You will notice that its value increments after the "system_reset", which is exactly what we don't want.

Afterwards, you can update qemu and test the fix by doing the same steps (make sure you adjust the "file:/tmp/..." path).

2) regression test, booting Ubuntu kernel/initrd pairs (installer's should be enough) from supported releases, and checking they boot/reach a prompt.

[ Where problems could occur ]

This is a change impacting normal x86 code, so although the patch is small and well contained, in the unlikely case that we find a regression it will impact more users. As such, and under Mauricio's advice, the test plan is being extended to really guarantee that the common virtualization scenarios are not impacted. If we find a problem with this update, there is the possibility of reverting it temporarily until we can devise a proper fix.

[ Original Description ]

Description:
Some versions of Windows hang on reboot if their TSC value is greater
than 2^54. The calibration of the Hyper-V reference time overflows
and fails; as a result the processors' clock sources are out of sync.

The issue is that the TSC _should_ be reset to 0 on CPU reset and
QEMU tries to do that. However, KVM special cases writing 0 to the
TSC and thinks that QEMU is trying to hot-plug a CPU, which is
correct the first time through but not later. Thwart this valiant
effort and reset the TSC to 1 instead, but only if the CPU has been
run once.

For this to work, env->tsc has to be moved to the part of CPUArchState
that is not zeroed at the beginning of x86_cpu_reset.

Solution: [PATCH] target/i386: properly reset TSC on reset

I created and tested a ppa ubuntu package already. The patch fixes this issue.
Link to ppa: https://launchpad.net/~bhinz83/+archive/ubuntu/openstack-rds/+packages

It affects only jammy 22.04 package. The newest version is: qemu-1:6.2+dfsg-2ubuntu6.19

Tags: jammy patch

Related branches

Revision history for this message
Björn Hinz (bhinz83) wrote :
description: updated
description: updated
Björn Hinz (bhinz83)
description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Patch imported from RHEL 8" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Björn Hinz (bhinz83) wrote :
Paride Legovini (paride)
Changed in qemu (Ubuntu Jammy):
status: New → Triaged
Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Paride Legovini (paride) wrote :

Hello and thanks for this bug report, for attaching the patch and for the PPA.

I know you wrote that this bug "affects only jammy 22.04 package" but let me ask explicitly: is this fixed in the Ubuntu 24.04 LTS? This is important for us know for a process point of view.

Also: making the fix land to Jammy will require verification from an user affected by the bug. This is likely you, given that we are unlikely to have the required Windows version at hand. The process consists in installing the affected package from the -proposed pocket, and to verify it works as expected (more on this process: [1]). Are you willing to help us this verification?

Thanks!

[1] https://wiki.ubuntu.com/StableReleaseUpdates

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you Björn for reporting the bug, and Paride for the initial triage.

As Paride said, we have to make sure this is not affecting other versions of QEMU as well. If the patch mentioned in the description is indeed the only one needed to fix the bug, then I think we're good:

$ git tag --contains 5286c3662294119dc2dd1e9296757337211451f6
v7.0.0
v7.0.0-rc2
v7.0.0-rc3
v7.0.0-rc4
v7.1.0
v7.1.0-rc0
v7.1.0-rc1
v7.1.0-rc2
v7.1.0-rc3
v7.1.0-rc4
v7.2.0
v7.2.0-rc0
v7.2.0-rc1
v7.2.0-rc2
v7.2.0-rc3
v7.2.0-rc4
v7.2.1
v7.2.2
v7.2.3
v7.2.4
v7.2.5
v7.2.6
v7.2.7
v7.2.8
v7.2.9
v8.0.0
v8.0.0-rc0
v8.0.0-rc1
v8.0.0-rc2
v8.0.0-rc3
v8.0.0-rc4
v8.0.1
v8.0.2
v8.0.3
v8.0.4
v8.0.5
v8.1.0
v8.1.0-rc0
v8.1.0-rc1
v8.1.0-rc2
v8.1.0-rc3
v8.1.0-rc4
v8.1.1
v8.1.2
v8.1.3
v8.1.4
v8.1.5
v8.2.0
v8.2.0-rc0
v8.2.0-rc1
v8.2.0-rc2
v8.2.0-rc3
v8.2.0-rc4
v8.2.1

We have QEMU 8.0.4 on Mantic and 8.2.2 on Noble.

@Björn, while it would be good to have reproduction steps for the bug, we can also rely on your help to verify the correctness of the fix (as Paride explained).

I'll work on prepare an upload for Jammy meanwhile.

Thanks.

Changed in qemu (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
description: updated
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi again, Björn,

Could you please give the following PPA a try and tell me if the package works?

https://launchpad.net/~sergiodj/+archive/ubuntu/qemu

The QEMU version there is 1:6.2+dfsg-2ubuntu6.20~ppa1.

Thanks a lot.

Revision history for this message
Björn Hinz (bhinz83) wrote :

Hi,
thanks for your investigation.

@Sergio In rhel bug desciption https://bugzilla.redhat.com/show_bug.cgi?id=2074737 is described a test.
In qemu I found the patch https://github.com/qemu/qemu/commit/5286c3662294119dc2dd1e9296757337211451f6 too.
It seems that this issue patched since qemu 7.0.0

@Paride I saw this issue in our production environment with "Windows Server2012 R2", "Windows Server 2019" and "Windows Server 2022" with all available updates installed. We had this problem for the last years but only found this patch last month.

@Sergio I will try and test your ppa package.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Sergio,

Thanks for the upload!

The bug and fix sound good and relatively simple to verify.

The SRU bug template is still missing Test Plan / Regression Potential, so I'll mark it as Incomplete for the time being (I see it may be coming soon, per "TBD").

While looking at the Björn's mention of the test case in RH BZ, that uses the `rdtsc.flat` kernel, which I couldn't find elsewhere. Do you know about it?
It's apparently a `rdtsc`/print loop we can replicate, if needed.

If I may, I'd suggest to test this in 3 ways:

1) unit test, with such rdtsc/print loop (and confirm the tsc value decreases after system_reset)
2) functional test, booting Windows (e.g., downloaded from MSFT Evaluation Center) and changing TSC manually to a problematic value (> 2^54) before reset, with the QEMU monitor or GDB, if possible?
3) regression test, booting Ubuntu kernel/initrd pairs (installer's should be enough) from supported releases, and checking they boot/reach a prompt.

I realize that looks like too much for a simple fix, but this is QEMU on amd64.
I'd be quite willing to help with that if needed. :)

Thanks again!

Changed in qemu (Ubuntu Jammy):
status: Triaged → Incomplete
Revision history for this message
Björn Hinz (bhinz83) wrote :

@Sergio your package 1:6.2+dfsg-2ubuntu6.20~ppa1 works

@Mauricio as you can see the qemu version 6.2 is used in RHEL 8.5 too. They integrated this patch two years ago. The QEMU project has this patch since version 7.0. I think it is well tested.
The business impact of this bug with Windows VM's on OpenStack is huge. For me it is good to use my own package but for all other customers of ubuntu this isn't a nice way. In my opinion it is not good to wait for another month.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Mauricio,

Apologies for taking long to reply. As you know, I've been busy with other stuff.

Thank you very much for your considerations. It seems that I jumped the gun on this SRU and uploaded without finishing the SRU text; sorry. Also, I liked your suggestions for more testing. I was able to perform the first one using tsc.flat from this repo: https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git. I'll post my results later.

I'm not entirely sure I'll be able to perform test (2) from your list, but I will try.

As for test (3), I can also work on it.

I'll update the SRU text accordingly. Thanks.

description: updated
Changed in qemu (Ubuntu Jammy):
status: Incomplete → In Progress
Changed in qemu (Ubuntu):
status: Incomplete → Invalid
status: Invalid → Fix Released
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.