[Azure] [NVMe] cpu soft lockup issue when run fio against nvme disks

Bug #1995408 reported by John Cabaj
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Invalid
Undecided
Unassigned
Kinetic
Fix Released
Medium
John Cabaj

Bug Description

SRU Justification

[Impact]

Hypervisor only allocates interrupts to happen on a single physical GPU, which can lead to CPU soft locks when using PCI NVME under heavy load.

Kinetic will get this patch via stable updates.

[Test Case]

Microsoft tested

[Where things could go wrong]

Hypervisor PCI driver may not load correctly. Patch did not apply cleanly and could lead to merge conflicts once the upstream patch is pulled in.

[Other Info]

SF: #00346549

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1995408

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
John Cabaj (john-cabaj) wrote :
John Cabaj (john-cabaj)
description: updated
John Cabaj (john-cabaj)
affects: linux (Ubuntu) → linux-azure (Ubuntu)
John Cabaj (john-cabaj)
Changed in linux-azure (Ubuntu):
assignee: nobody → John Cabaj (john-cabaj)
Stefan Bader (smb)
Changed in linux-azure (Ubuntu):
status: Incomplete → Invalid
Changed in linux-azure (Ubuntu Kinetic):
assignee: nobody → John Cabaj (john-cabaj)
importance: Undecided → Medium
status: New → In Progress
Changed in linux-azure (Ubuntu):
assignee: John Cabaj (john-cabaj) → nobody
Tim Gardner (timg-tpi)
Changed in linux-azure (Ubuntu Kinetic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.19.0-1014.15 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-kinetic' to 'verification-done-kinetic'. If the problem still exists, change the tag 'verification-needed-kinetic' to 'verification-failed-kinetic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-kinetic-linux-azure verification-needed-kinetic
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Microsoft tested, marking verification done.

tags: added: verification-done-kinetic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (103.1 KiB)

This bug was fixed in the package linux-azure - 5.19.0-1016.17

---------------
linux-azure (5.19.0-1016.17) kinetic; urgency=medium

  * kinetic/linux-azure: 5.19.0-1016.17 -proposed tracker (LP: #1999735)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  [ Ubuntu: 5.19.0-28.29 ]

  * kinetic/linux: 5.19.0-28.29 -proposed tracker (LP: #1999746)
  * mm:vma05 in ubuntu_ltp fails with '[vdso] bug not patched' on kinetic/linux
    5.19.0-27.28 (LP: #1999094)
    - fix coredump breakage

linux-azure (5.19.0-1015.16) kinetic; urgency=medium

  * kinetic/linux-azure: 5.19.0-1015.16 -proposed tracker (LP: #1999417)

  * Azure: Jammy fio test hangs, swiotlb buffers exhausted (LP: #1998838)
    - SAUCE: scsi: storvsc: Fix swiotlb bounce buffer leak in confidential VM

  * Azure: MANA New Feature MANA XDP_Redirect Action (LP: #1998351)
    - net: mana: Add support of XDP_REDIRECT action

linux-azure (5.19.0-1014.15) kinetic; urgency=medium

  * kinetic/linux-azure: 5.19.0-1014.15 -proposed tracker (LP: #1997782)

  * Jammy/linux-azure: CONFIG_BLK_DEV_FD=n (LP: #1972017)
    - [Config] azure: CONFIG_BLK_DEV_FD=n

  * remove circular dep between linux-image and modules (LP: #1989334)
    - [Packaging] remove circular dep between modules and image

  * [Azure] [NVMe] cpu soft lockup issue when run fio against nvme disks
    (LP: #1995408)
    - PCI: hv: Only reuse existing IRTE allocation for Multi-MSI

  * [Azure][Arm64] Unable to detect all VF nics / Failing provisioning
    (LP: #1996117)
    - PCI: hv: Fix the definition of vector in hv_compose_msi_msg()

  * Kinetic update: v5.19.9 upstream stable release (LP: #1994068) // Kinetic
    update: v5.19.15 upstream stable release (LP: #1994078) // Kinetic update:
    v5.19.17 upstream stable release (LP: #1994179)
    - [Configs] azure: Updates after rebase

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/master)

  [ Ubuntu: 5.19.0-27.28 ]

  * kinetic/linux: 5.19.0-27.28 -proposed tracker (LP: #1997794)
  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.11.14)
  * selftests/.../nat6to4 breaks the selftests build (LP: #1996536)
    - [Config] Disable selftests/net/bpf/nat6to4
  * Expose built-in trusted and revoked certificates (LP: #1996892)
    - [Packaging] Expose built-in trusted and revoked certificates
  * support for same series backports versioning numbers (LP: #1993563)
    - [Packaging] sameport -- add support for sameport versioning
  * Add cs35l41 firmware loading support (LP: #1995957)
    - ASoC: cs35l41: Move cs35l41 exit hibernate function into shared code
    - ASoC: cs35l41: Add common cs35l41 enter hibernate function
    - ASoC: cs35l41: Do not print error when waking from hibernation
    - ALSA: hda: cs35l41: Don't dereference fwnode handle
    - ALSA: hda: cs35l41: Allow compilation test on non-ACPI configurations
    - ALSA: hda: cs35l41: Drop wrong use of ACPI_PTR()
    - ALSA: hda: cs35l41: Consolidate selections under SND_HDA_SCODEC_CS35L41
    - ALSA: hda: hda_cs_dsp_ctl: Add Library to support CS_DSP ALSA controls
    - ALSA: hda: hda_cs_dsp_c...

Changed in linux-azure (Ubuntu Kinetic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.