Soft-lockup caused by snd_hda_intel

Bug #2027848 reported by Executenor
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I report this bug the the 6.5rc1 kernel.
Kernel before 6.5 hang even before there is any logging going on.
The problem is gone when i blacklist snd_hda_core.

jul 15 09:02:52 nick-HP-Laptop-15s-fq4xxx kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 78s! [kworker/1:1:90]
jul 15 09:02:52 nick-HP-Laptop-15s-fq4xxx kernel: Modules linked in: snd_seq_dummy snd_hrtimer rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_>
jul 15 09:03:07 nick-HP-Laptop-15s-fq4xxx kernel: hp_wmi snd_timer videodev input_leds spi_nor rapl cec sparse_keymap cfg80211 intel_cstate wmi_bmof serio_raw platform_profile videobuf2_common snd mei_>
jul 15 09:03:07 nick-HP-Laptop-15s-fq4xxx kernel: CPU: 1 PID: 90 Comm: kworker/1:1 Tainted: G U W L 6.5.0-060500rc1-generic #202307092131
jul 15 09:03:07 nick-HP-Laptop-15s-fq4xxx kernel: Hardware name: HP HP Laptop 15s-fq4xxx/89BC, BIOS F.31 03/25/2023
jul 15 09:03:07 nick-HP-Laptop-15s-fq4xxx kernel: Workqueue: events azx_probe_work [snd_hda_intel]
jul 15 09:03:07 nick-HP-Laptop-15s-fq4xxx kernel: RIP: 0010:_raw_spin_unlock_irq+0x15/0x50
jul 15 09:03:07 nick-HP-Laptop-15s-fq4xxx kernel: Code: cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f 44 00 00 <65> ff 0d 5c 61 52 >
jul 15 09:03:07 nick-HP-Laptop-15s-fq4xxx kernel: RSP: 0018:ffffb450c05dbb30 EFLAGS: 00000246
jul 15 09:03:07 nick-HP-Laptop-15s-fq4xxx kernel: RAX: 0000000000000007 RBX: 0000000000000007 RCX: 0000000000000000

Revision history for this message
Executenor (nicktelindert) wrote :
Download full text (5.9 KiB)

[ 3.283014] i915 0000:00:02.0: drm_WARN_ON(val == 0xffffffff)
[ 3.283034] WARNING: CPU: 7 PID: 321 at drivers/gpu/drm/i915/display/intel_tc.c:702 tgl_tc_phy_init+0xe1/0x110 [i915]
[ 3.283338] Modules linked in: snd_pcm polyval_clmulni polyval_generic rtw88_8821ce snd_seq_midi rtw88_8821c kvm_intel rtw88_pci ghash_clmulni_intel snd_seq_midi_event sha512_ssse3 i915(+) binfmt_misc snd_rawmidi aesni_intel btusb rtw88_core crypto_simd btrtl kvm btbcm snd_seq uvcvideo btintel drm_buddy mei_pxp mei_hdcp irqbypass intel_rapl_msr cryptd videobuf2_vmalloc btmtk snd_seq_device mac80211 ttm nls_iso8859_1 uvc videobuf2_memops drm_display_helper bluetooth videobuf2_v4l2 cmdlinepart snd_timer videodev hp_wmi rapl input_leds cec sparse_keymap videobuf2_common ecdh_generic intel_cstate snd spi_nor processor_thermal_device_pci_legacy cfg80211 mc serio_raw platform_profile rc_core wmi_bmof mtd processor_thermal_device ee1004 ecc processor_thermal_rfim soundcore drm_kms_helper hid_multitouch mei_me libarc4 processor_thermal_mbox processor_thermal_rapl i2c_algo_bit mei intel_rapl_common igen6_edac intel_soc_dts_iosf int3400_thermal int3403_thermal dptf_pch_fivr acpi_thermal_rel int340x_thermal_zone acpi_pad mac_hid
[ 3.283408] wireless_hotkey sch_fq_codel msr parport_pc ppdev lp drm parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 nvme nvme_core nvme_common hid_generic crc32_pclmul spi_intel_pci i2c_i801 intel_lpss_pci spi_intel i2c_smbus intel_lpss idma64 xhci_pci i2c_hid_acpi vmd xhci_pci_renesas i2c_hid hid video wmi pinctrl_tigerlake
[ 3.283440] CPU: 7 PID: 321 Comm: systemd-udevd Tainted: G U W 6.5.0-060500rc1-generic #202307092131
[ 3.283443] Hardware name: HP HP Laptop 15s-fq4xxx/89BC, BIOS F.31 03/25/2023
[ 3.283444] RIP: 0010:tgl_tc_phy_init+0xe1/0x110 [i915]
[ 3.283679] Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 b8 02 09 d9 48 c7 c1 38 49 4a c1 4c 89 e2 48 c7 c7 b1 5b 4c c1 48 89 c6 e8 ff 5b 68 d8 <0f> 0b 48 8b 03 8b 70 7c 48 8b 38 e8 8f d5 f9 ff 89 c1 41 89 c4 c1
[ 3.283681] RSP: 0018:ffff9ab5004bb6f0 EFLAGS: 00010246
[ 3.283683] RAX: 0000000000000000 RBX: ffff8a21078e9800 RCX: 0000000000000000
[ 3.283685] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 3.283687] RBP: ffff9ab5004bb708 R08: 0000000000000000 R09: 0000000000000000
[ 3.283688] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a2101b26cc0
[ 3.283689] R13: ffff8a210b3d8000 R14: ffff8a21078e9810 R15: 0000000fffffffe0
[ 3.283691] FS: 00007f739cea98c0(0000) GS:ffff8a2277bc0000(0000) knlGS:0000000000000000
[ 3.283693] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.283695] CR2: 00007f5c26b0ee24 CR3: 000000011893a002 CR4: 0000000000770ee0
[ 3.283697] PKRU: 55555554
[ 3.283698] Call Trace:
[ 3.283699] <TASK>
[ 3.283701] ? show_regs+0x6d/0x80
[ 3.283707] ? __warn+0x89/0x160
[ 3.283711] ? tgl_tc_phy_init+0xe1/0x110 [i915]
[ 3.283937] ? report_bug+0x17e/0x1b0
[ 3.283943] ? handle_bug+0x46/0x90
[ 3.283946] ? exc_invalid_op+0x18/0x80
[ 3.283949] ? asm_exc_invalid_op+0x1b/0x20
[ 3.283952] ? tgl_tc_phy_init+0xe1/0x110 [i915]
...

Read more...

Revision history for this message
Executenor (nicktelindert) wrote :

Another interesting fact is that the system works fine if i use the liquorix 6.4 kernel and switch the audio driver to legacy mode with dsp_driver=1 together with setting the model to dell-headset-multi.

Revision history for this message
brett hassall (brett-hassall) wrote :

i am also seeing soft lockups related to snd_hda-core - also on a HP 15s-fq4xxx. My trace is different but it would seem too much of a coincidence for the issues not to related.

This has been a day 0 problem for me. Changing kernels does not seem to help. I have found that it occurs every time after a restart. If I do a coldstart (ie poweroff then poweron), the problem has not occuirred in over 3 months.

Jul 16 07:02:11 jammy kernel: watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [kworker/6:2:308]
...
Jul 16 07:02:11 jammy kernel: CPU: 6 PID: 308 Comm: kworker/6:2 Tainted: G W 5.15.0-76-generic #83-Ubuntu
Jul 16 07:02:11 jammy kernel: Hardware name: HP HP Laptop 15s-fq4xxx/89BC, BIOS F.26 10/18/2022
Jul 16 07:02:11 jammy kernel: Workqueue: events sof_probe_work [snd_sof]
Jul 16 07:02:11 jammy kernel: RIP: 0010:snd_hdac_bus_send_cmd+0xdf/0x130 [snd_hda_core]
Jul 16 07:02:11 jammy kernel: Code: 1f 44 00 00 31 c0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 4c 89 f7 c6 07 00 0f 1f 40 00 fb 66 0f 1f 44 00 00 <b8> f5 ff ff ff eb d5 4c 89 f7 c6 07 00 0f 1f 40 00 fb 66 0f 1f 44
...
Jul 16 07:02:11 jammy kernel: Call Trace:
Jul 16 07:02:11 jammy kernel: <TASK>
Jul 16 07:02:11 jammy kernel: snd_hdac_bus_exec_verb_unlocked+0x78/0x160 [snd_hda_core]
Jul 16 07:02:11 jammy kernel: snd_hdac_bus_exec_verb+0x3e/0x60 [snd_hda_core]
Jul 16 07:02:11 jammy kernel: snd_hdac_exec_verb+0x38/0x40 [snd_hda_core]
Jul 16 07:02:11 jammy kernel: hda_reg_read+0x1d3/0x270 [snd_hda_core]
Jul 16 07:02:11 jammy kernel: __snd_hdac_regmap_read_raw+0x7e/0xe0 [snd_hda_core]
Jul 16 07:02:11 jammy kernel: snd_hdac_regmap_read_raw+0x10/0x20 [snd_hda_core]
Jul 16 07:02:11 jammy kernel: snd_hdac_device_init+0x21f/0x440 [snd_hda_core]
Jul 16 07:02:11 jammy kernel: ? __kmalloc_node_track_caller+0x16f/0x3a0
Jul 16 07:02:11 jammy kernel: snd_hdac_ext_bus_device_init+0x66/0xc0 [snd_hda_ext_core]
Jul 16 07:02:11 jammy kernel: hda_codec_probe+0xe3/0x230 [snd_sof_intel_hda]
Jul 16 07:02:11 jammy kernel: hda_codec_probe_bus+0x4c/0x70 [snd_sof_intel_hda]
Jul 16 07:02:11 jammy kernel: hda_init_caps+0x20a/0x27a [snd_sof_intel_hda_common]
Jul 16 07:02:11 jammy kernel: ? pci_read_config_dword+0x27/0x40
Jul 16 07:02:11 jammy kernel: ? snd_sof_pci_update_bits+0x83/0x130 [snd_sof]
Jul 16 07:02:11 jammy kernel: ? synchronize_irq+0x3e/0xb0
Jul 16 07:02:11 jammy kernel: hda_dsp_probe+0x47b/0x580 [snd_sof_intel_hda_common]
Jul 16 07:02:11 jammy kernel: sof_probe_continue+0x27/0x1d0 [snd_sof]
Jul 16 07:02:11 jammy kernel: sof_probe_work+0x19/0x30 [snd_sof]
...
Jul 16 07:02:11 jammy kernel: </TASK>

Revision history for this message
brett hassall (brett-hassall) wrote :

Full trace for previous comment

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Executenor (nicktelindert) wrote :

I can also confirm that this problem indeed happens on a warm boot. A cold boot seems to boot fine. I forgot to mention that. When i use the 5.15 kernel i had the same snd_hda_core call trace Brett posted. So yes we are talking about the same issue but 6.5 handles it somewhat different.

Revision history for this message
brett hassall (brett-hassall) wrote :

I happen to have Lunar partition available (on an external SSD).

I booted Lunar on the same HP laptop and then did a restart/warmboot. The trace is slightly different again so I've uploaded in case it helps.

Revision history for this message
brett hassall (brett-hassall) wrote :

Apologies for hijacking this bug:

Executenor, would you mind taking a look at bug 2025040 and letting me know if you've seen the same thing on your HP.

No pressure, tks.

Revision history for this message
Executenor (nicktelindert) wrote :

No apologies neccesary, this is exactly how a community should work. I will take a look at the issue you posted.

Revision history for this message
brett hassall (brett-hassall) wrote :

i tried setting snd_intel_dspcfg.dsp_driver to 1 (legacy) as a workaround without success but it did change the lockup output:

watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kworker/7:3:326]
Sep 01 13:03:32 jammy kernel: Modules linked in: bridge xt_pkttype xt_tcpudp xt_state stp xt_conntrack >
Sep 01 13:03:32 jammy kernel: processor_thermal_rfim cfg80211 nls_iso8859_1 i2c_algo_bit snd_timer pro>
Sep 01 13:03:32 jammy kernel: CPU: 7 PID: 326 Comm: kworker/7:3 Tainted: G W 5.15.0-82-g>
Sep 01 13:03:32 jammy kernel: Hardware name: HP HP Laptop 15s-fq4xxx/89BC, BIOS F.32 07/04/2023
Sep 01 13:03:32 jammy kernel: Workqueue: events azx_probe_work [snd_hda_intel]
Sep 01 13:03:32 jammy kernel: RIP: 0010:snd_hdac_bus_send_cmd+0xdf/0x130 [snd_hda_core]
Sep 01 13:03:32 jammy kernel: Code: 1f 44 00 00 31 c0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc c>
Sep 01 13:03:32 jammy kernel: RSP: 0018:ffffaeec80433bc0 EFLAGS: 00000246
Sep 01 13:03:32 jammy kernel: RAX: 0000000000000007 RBX: 0000000000000007 RCX: ffffaeec80433c14
Sep 01 13:03:32 jammy kernel: RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff900a80ffb4c0
Sep 01 13:03:32 jammy kernel: RBP: ffffaeec80433bf0 R08: 0000000000000000 R09: ffff900a80ffb428
Sep 01 13:03:32 jammy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff900a80ffb028
Sep 01 13:03:32 jammy kernel: R13: 00000000401f0005 R14: ffff900a80ffb4c0 R15: ffff900a80ffb038
Sep 01 13:03:32 jammy kernel: FS: 0000000000000000(0000) GS:ffff900bf7bc0000(0000) knlGS:0000000000000>
Sep 01 13:03:32 jammy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 01 13:03:32 jammy kernel: CR2: 00007fff6fc8ebac CR3: 0000000100da6006 CR4: 0000000000770ee0
Sep 01 13:03:32 jammy kernel: PKRU: 55555554
Sep 01 13:03:32 jammy kernel: Call Trace:
Sep 01 13:03:32 jammy kernel: <TASK>
Sep 01 13:03:32 jammy kernel: azx_send_cmd+0x2d/0x40 [snd_hda_codec]
Sep 01 13:03:32 jammy kernel: snd_hdac_bus_exec_verb_unlocked+0x78/0x160 [snd_hda_core]
Sep 01 13:03:32 jammy kernel: snd_hdac_bus_exec_verb+0x3e/0x60 [snd_hda_core]
Sep 01 13:03:32 jammy kernel: snd_hdac_exec_verb+0x38/0x40 [snd_hda_core]
Sep 01 13:03:32 jammy kernel: hda_reg_read+0x1d3/0x270 [snd_hda_core]
Sep 01 13:03:32 jammy kernel: __snd_hdac_regmap_read_raw+0x7e/0xe0 [snd_hda_core]
Sep 01 13:03:32 jammy kernel: snd_hdac_regmap_read_raw+0x10/0x20 [snd_hda_core]
Sep 01 13:03:32 jammy kernel: snd_hdac_device_init+0x21f/0x440 [snd_hda_core]
Sep 01 13:03:32 jammy kernel: snd_hda_codec_new+0x93/0x130 [snd_hda_codec]
Sep 01 13:03:32 jammy kernel: azx_probe_codecs+0x15b/0x1f0 [snd_hda_codec]
Sep 01 13:03:32 jammy kernel: azx_probe_continue+0x36c/0x450 [snd_hda_intel]
Sep 01 13:03:32 jammy kernel: azx_probe_work+0x15/0x20 [snd_hda_intel]
Sep 01 13:03:32 jammy kernel: process_one_work+0x228/0x3d0
Sep 01 13:03:32 jammy kernel: worker_thread+0x53/0x420
Sep 01 13:03:32 jammy kernel: ? process_one_work+0x3d0/0x3d0
Sep 01 13:03:32 jammy kernel: kthread+0x127/0x150
Sep 01 13:03:32 jammy kernel: ? set_kthread_struct+0x50/0x50
Sep 01 13:03:32 jammy kernel: ret_from_fork+0x1f/0x30
Sep 01 13:03:32 jammy kernel: </TASK>

Revision history for this message
Executenor (nicktelindert) wrote :

The legacy flag only works when you also set a model, for me the model dell-headset-multi works. You have to shutdown the system completely after setting these options. When you have done that you should be able to reboot without locking up. The logs will still contain errors though.

Revision history for this message
brett hassall (brett-hassall) wrote :

I tested before rebooting by using the following commands:
  sudo rmmod snd_sof_pci_intel_tgl
  sudo rmmod snd_sof_intel_hda_common
  sudo rmmod snd_hda_intel
  sudo rmmod snd_intel_dspcfg
  sudo modprobe snd_intel_dspcfg dsp_driver=1
  sudo modprobe snd_sof_intel_hda_common
  sudo modprobe snd_sof_pci_intel_tgl
  sudo modprobe snd_hda_intel
After un-muting the output, a video played ok in vlc without any other changes.

To persist the change I created a conf file in /etc/modprobe.d and ran sudo update-initramfs -k all -u.
After a cold boot, a video played ok in vlc. After a warm boot, I saw lockups.

This was using 5.15.0-82.

Revision history for this message
brett hassall (brett-hassall) wrote :

Using Ubuntu 6.5.0-2.2 kernel as it is most likely to write the journal after a lockup, I turned on some audio debug and saw this:

cold boot (all entries below have the same timestamp)
  jammy kernel: snd_sof_intel_hda:hda_codec_probe: sof-audio-pci-intel-tgl 0000:00:1f.3: HDA codec #0 probed OK: response: 10ec0236
  jammy kernel: snd_sof_intel_hda:request_codec_module: hdaudio ehdaudio0D0: loading codec module: hdaudio:v10EC0236r00100002a01
  jammy kernel: snd_sof_intel_hda:hda_codec_probe: sof-audio-pci-intel-tgl 0000:00:1f.3: HDA codec #2 probed OK: response: 80862812
  jammy kernel: snd_sof_intel_hda:request_codec_module: hdaudio ehdaudio0D2: loading codec module: hdaudio:v80862812r00100000a01

warm boot
  jammy kernel: snd_sof_intel_hda:hda_codec_probe: sof-audio-pci-intel-tgl 0000:00:1f.3: HDA codec #0 probed OK: response: fe05
  <then nothing for 26 sec until>
  jammy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/2:1:92]

So it seems the HDA codec probe should return a PCI ID and for some reason after a warm boot it does not.

Revision history for this message
brett hassall (brett-hassall) wrote :

Upstream bug raised: https://bugzilla.kernel.org/show_bug.cgi?id=217867.

The output from #13 is the same on upstream v6.5

Revision history for this message
Executenor (nicktelindert) wrote :

The issue is caused by this change: https://github.com/torvalds/linux/commit/5aec989
I created a new repository with this fix included: https://github.com/nicktelindert/linux

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.