4.4 kernel panics in kvm wake_up() handler

Bug #1908428 reported by Ioanna Alifieraki
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Fix Released
Medium
Unassigned

Bug Description

[Description]

User reported that 4.4 kernels are affected by the bug in [1].

The bug presents itself with the following trace :

[219901.424329] CPU: 19 PID: 0 Comm: swapper/19 Tainted: G OE 4.4.0-133-generic #159~14.04.1-Ubuntu
[219901.441800] task: ffff885f62e63fc0 ti: ffff885f62e7c000 task.ti: ffff885f62e7c000
[219901.449408] RIP: 0010:[<ffffffffc09c8cfd>] [<ffffffffc09c8cfd>] wakeup_handler+0x6d/0xa0 [kvm_intel]
[219901.458791] RSP: 0018:ffff885f7c043f70 EFLAGS: 00010083
[219901.464217] RAX: ffff885f7c040000 RBX: dead0000000000b8 RCX: ffff885f7c0586c0
[219901.471480] RDX: dead000000000100 RSI: 0000000000000000 RDI: ffff885f7c0586b0
[219901.478741] RBP: ffff885f7c043f90 R08: 0000000000000000 R09: 0000c7ffc2ec9069
[219901.486003] R10: 0000000000000494 R11: ffff885f7c057370 R12: 00000000000186b0
[219901.493267] R13: 0000000000000013 R14: 00000000000186c0 R15: ffff885f62e7c000
[219901.500528] FS: 0000000000000000(0000) GS:ffff885f7c040000(0000) knlGS:0000000000000000
[219901.511738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[219901.517597] CR2: 00007f6d57098000 CR3: 0000003183dfe000 CR4: 0000000000362670
[219901.524860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[219901.532121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[219901.539384] Stack:
[219901.541509] 0000000000000000 0000000000000013 0000000000000000 0000000000000000
[219901.549120] ffff885f7c043fa8 ffffffff8102fa99 ffffffff81f40200 ffff885f62e7fe98
[219901.556747] ffffffff8182131f ffff885f62e7fde8 <EOI> ffff885f62e7c000 0000000000000000
[219901.565006] Call Trace:
[219901.567567] <IRQ>
[219901.569592] [<ffffffff8102fa99>] smp_kvm_posted_intr_wakeup_ipi+0x59/0x70
[219901.576795] [<ffffffff8182131f>] kvm_posted_intr_wakeup_ipi+0xbf/0xd0
[219901.583431] <EOI>
[219901.585456] [<ffffffff81037b30>] ? hard_disable_TSC+0x30/0x30
[219901.591621] [<ffffffff810645a6>] ? native_safe_halt+0x6/0x10
[219901.597479] [<ffffffff81037b4e>] default_idle+0x1e/0xe0
[219901.602900] [<ffffffff810386c5>] arch_cpu_idle+0x15/0x20
[219901.608416] [<ffffffff810c3e7a>] default_idle_call+0x2a/0x40
[219901.614270] [<ffffffff810c41d0>] cpu_startup_entry+0x2e0/0x350
[219901.620305] [<ffffffff81050c2c>] start_secondary+0x16c/0x190

The root cause is blocked_vcpu_on_cpu list is corrupted.
This bug is fixed with the patchset found in [2].
Only the first 3 (out of 4) of them have made their way in upstream kernel and
are the ones needed to fix the bug.

[Test case]
It was not possible to reproduce this bug locally.
A test kernel with the fixing patches has been provided to the user and they confirmed that it resolves the issue.

[Regression Potential]

The patches have been accepted upstream in 4.14 and so far there are no known regressions.
Backporting the patches was necessary; original patches modify pi_pre/post_block functions which are not present in 4.4.
These functions are introduced by upstream commit bc22512bb24c(kvm: vmx: rename vmx_pre/post_block to pi_pre/post_block).
Appropriate changes where made for the patches to modify vmx_pre/post_block function without changing the functionality of the patches.
Testing has not revealed any regressions.

[Other]

Only 4.4 kernels are affected.

[1] https://marc.info/?l=kvm&m=149559827906211&w=2
[2] https://<email address hidden>/

CVE References

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1908428

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
tags: added: xenial
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Xenial):
status: Incomplete → Confirmed
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Stefan Bader (smb)
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
Changed in linux (Ubuntu Xenial):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

Hi Ioanna, may you please verify the kernel in -proposed resolves this bug? Thank you!

Revision history for this message
Ioanna Alifieraki (joalif) wrote :

#VERIFICATION

The user who brought this bug to our attention, had tested a test kernel with the relevant commits included and had confirmed that it addresses the bug.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.8 KiB)

This bug was fixed in the package linux - 4.4.0-203.235

---------------
linux (4.4.0-203.235) xenial; urgency=medium

  * xenial/linux: 4.4.0-203.235 -proposed tracker (LP: #1914140)

  * Ubuntu 16.04 kernel 4.4.0-202 basic commands hanging (LP: #1913853)
    - SAUCE: Revert "mm: check that mm is still valid in madvise()"

linux (4.4.0-202.234) xenial; urgency=medium

  * xenial/linux: 4.4.0-202.234 -proposed tracker (LP: #1913086)

  * DMI entry syntax fix for Pegatron / ByteSpeed C15B (LP: #1910639)
    - Input: i8042 - unbreak Pegatron C15B

  * CVE-2020-29372
    - mm: check that mm is still valid in madvise()

  * errinjct open fails on IBM POWER LPAR (LP: #1908710)
    - powerpc/rtas: Fix typo of ibm, open-errinjct in RTAS filter

  * 4.4 kernel panics in kvm wake_up() handler (LP: #1908428)
    - kvm: vmx: rename vmx_pre/post_block to pi_pre/post_block
    - KVM: VMX: extract __pi_post_block
    - KVM: VMX: avoid double list add with VT-d posted interrupts

  * restore reverted commit "crypto: arm64/sha - avoid non-standard inline asm
    tricks" (LP: #1907489)
    - crypto: arm64/sha - avoid non-standard inline asm tricks

  * CVE-2020-29374
    - gup: document and work around "COW can break either way" issue

  * Xenial update: v4.4.249 upstream stable release (LP: #1910139)
    - spi: bcm2835aux: Fix use-after-free on unbind
    - spi: bcm2835aux: Restore err assignment in bcm2835aux_spi_probe
    - ARC: stack unwinding: don't assume non-current task is sleeping
    - platform/x86: acer-wmi: add automatic keyboard background light toggle key
      as KEY_LIGHTS_TOGGLE
    - Input: cm109 - do not stomp on control URB
    - Input: i8042 - add Acer laptops to the i8042 reset list
    - [Config] updateconfigs for SPI_DYNAMIC
    - spi: Prevent adding devices below an unregistering controller
    - net/mlx4_en: Avoid scheduling restart task if it is already running
    - tcp: fix cwnd-limited bug for TSO deferral where we send nothing
    - net: stmmac: delete the eee_ctrl_timer after napi disabled
    - net: bridge: vlan: fix error return code in __vlan_add()
    - USB: dummy-hcd: Fix uninitialized array use in init()
    - USB: add RESET_RESUME quirk for Snapscan 1212
    - ALSA: usb-audio: Fix potential out-of-bounds shift
    - ALSA: usb-audio: Fix control 'access overflow' errors from chmap
    - xhci: Give USB2 ports time to enter U3 in bus suspend
    - USB: sisusbvga: Make console support depend on BROKEN
    - [Config] updateconfigs for USB_SISUSBVGA_CON
    - ALSA: pcm: oss: Fix potential out-of-bounds shift
    - serial: 8250_omap: Avoid FIFO corruption caused by MDR1 access
    - USB: serial: cp210x: enable usb generic throttle/unthrottle
    - scsi: bnx2i: Requires MMU
    - can: softing: softing_netdev_open(): fix error handling
    - RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewait
    - dm table: Remove BUG_ON(in_interrupt())
    - soc/tegra: fuse: Fix index bug in get_process_id
    - USB: serial: option: add interface-number sanity check to flag handling
    - USB: gadget: f_rndis: fix bitrate for SuperSpeed and above
    - usb: chipidea: ci_hdrc_imx: Pass DISABLE_DEVICE_STREAMING flag to imx6ul
...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.