net:veth.sh in ubuntu_kernel_selftests hang with J-intel-iotg (BUG: unable to handle page fault)

Bug #2008085 reported by Po-Hsu Lin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Invalid
Undecided
Unassigned
ubuntu-kernel-tests
New
Undecided
Unassigned
linux-intel-iotg (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Released
Medium
Unassigned

Bug Description

Issue found on node "onibi" with J-intel-iotg 5.15.0-1026.31 this cycle.

The veth.sh test in net category will hang and timeout, causing test report incomplete.

I can see some traces in dmesg with manual test.

ubuntu@onibi:~/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/net$ sudo ./veth.sh
default - gro flag ok
        - peer gro flag ok
        - tso flag ok
        - peer tso flag ok
        - aggregation ok
        - aggregation with TSO off ok
with gro on - gro flag ok
        - peer gro flag ok
        - tso flag ok
        - peer tso flag ok
        - aggregation with TSO off ok
default channels ok
with gro enabled on link down - gro flag ok
        - peer gro flag ok
        - tso flag ok
        - peer tso flag ok
        - aggregation with TSO off ok
setting tx channels ok
setting both rx and tx channels ok
bad setting: combined channels ok
setting invalid channels nr fail rx:3:3 tx:3:5 combined:n/a:n/a
bad setting: XDP with RX nr less than TX ok
(hangs here)

dmesg output:
[ 547.520923] BUG: unable to handle page fault for address: ffffb73800000001
[ 547.520999] #PF: supervisor write access in kernel mode
[ 547.521045] #PF: error_code(0x0002) - not-present page
[ 547.521089] PGD 100000067 P4D 100000067 PUD 0
[ 547.521133] Oops: 0002 [#1] SMP PTI
[ 547.521168] CPU: 1 PID: 1559 Comm: ip Not tainted 5.15.0-1026-intel-iotg #31-Ubuntu
[ 547.521233] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.8.2 08/17/2011
[ 547.521293] RIP: 0010:veth_xdp+0x18f/0x1e0 [veth]
[ 547.521342] Code: ff 41 89 9d 1c 01 00 00 49 21 85 e8 00 00 00 e9 74 ff ff ff 48 c7 c7 80 e3 b0 c0 e8 2b 3b 06 c1 b8 e4 ff ff ff 4d 85 ff 74 85 <49> c7 07 80 e3 b0 c0 e9 79 ff ff ff 48 c7 c7 20 e4 b0 c0 e8 09 3b
[ 547.521488] RSP: 0018:ffffb738c254f420 EFLAGS: 00010282
[ 547.521535] RAX: 00000000ffffffe4 RBX: 0000000000000db2 RCX: ffffb738c254fb20
[ 547.521594] RDX: ffffffffc0b0bf90 RSI: ffffb738c254f468 RDI: ffffffffc0b0e380
[ 547.521653] RBP: ffffb738c254f450 R08: 0000000000000001 R09: ffffb738c0081000
[ 547.521711] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c65ced90000
[ 547.521769] R13: ffff8c65c12f6000 R14: 0000000000000000 R15: ffffb73800000001
[ 547.521828] FS: 00007faa028b3b80(0000) GS:ffff8c66f7640000(0000) knlGS:0000000000000000
[ 547.521895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.521943] CR2: ffffb73800000001 CR3: 000000010f068000 CR4: 00000000000006e0
[ 547.522004] Call Trace:
[ 547.522029] <TASK>
[ 547.522052] ? veth_open+0x90/0x90 [veth]
[ 547.522094] dev_xdp_install+0x66/0xf0
[ 547.522135] dev_xdp_attach+0x1fc/0x590
[ 547.522171] ? __bpf_prog_get+0x1f/0xe0
[ 547.522212] dev_change_xdp_fd+0x200/0x240
[ 547.522252] do_setlink+0xba2/0xc70
[ 547.522288] ? dev_get_alias+0x35/0x50
[ 547.522326] __rtnl_newlink+0x61e/0xa20
[ 547.522363] ? security_sock_rcv_skb+0x2f/0x50
[ 547.522406] ? skb_queue_tail+0x48/0x60
[ 547.522444] ? sock_def_readable+0x4b/0x80
[ 547.522485] ? __netlink_sendskb+0x62/0x80
[ 547.522528] ? netlink_unicast+0x2fb/0x340
[ 547.522566] ? rtnl_getlink+0x398/0x420
[ 547.522611] ? kmem_cache_alloc_trace+0x17e/0x2a0
[ 547.522657] rtnl_newlink+0x49/0x70
[ 547.522692] rtnetlink_rcv_msg+0x15d/0x400
[ 547.522731] ? rtnl_calcit.isra.0+0x130/0x130
[ 547.524524] netlink_rcv_skb+0x56/0x100
[ 547.526314] rtnetlink_rcv+0x15/0x20
[ 547.528102] netlink_unicast+0x223/0x340
[ 547.529837] netlink_sendmsg+0x24b/0x4c0
[ 547.531505] sock_sendmsg+0x69/0x70
[ 547.533114] ____sys_sendmsg+0x252/0x290
[ 547.534667] ? import_iovec+0x31/0x40
[ 547.536164] ? sendmsg_copy_msghdr+0x7f/0xa0
[ 547.537609] ___sys_sendmsg+0x81/0xc0
[ 547.539024] ? rseq_ip_fixup+0x72/0x170
[ 547.540420] ? __rseq_handle_notify_resume+0x2d/0xc0
[ 547.541824] ? exit_to_user_mode_loop+0x10d/0x160
[ 547.543227] ? exit_to_user_mode_prepare+0x37/0xb0
[ 547.544623] ? syscall_exit_to_user_mode+0x27/0x50
[ 547.545993] ? __x64_sys_close+0x11/0x50
[ 547.547334] __sys_sendmsg+0x62/0xc0
[ 547.548650] __x64_sys_sendmsg+0x1d/0x30
[ 547.549918] do_syscall_64+0x5c/0xc0
[ 547.551134] ? exc_page_fault+0x89/0x170
[ 547.552322] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 547.553511] RIP: 0033:0x7faa02a07b17
[ 547.554680] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 547.557206] RSP: 002b:00007ffdbbca3678 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 547.558517] RAX: ffffffffffffffda RBX: 0000000063f5ffbc RCX: 00007faa02a07b17
[ 547.559818] RDX: 0000000000000000 RSI: 00007ffdbbca36e0 RDI: 0000000000000003
[ 547.561110] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000555ea5465830
[ 547.562386] R10: 00007faa02afa340 R11: 0000000000000246 R12: 0000000000000001
[ 547.563647] R13: 00007ffdbbca3790 R14: 0000000000000000 R15: 0000555ea4edb040
[ 547.564924] </TASK>
[ 547.566184] Modules linked in: algif_hash af_alg veth intel_powerclamp ipmi_ssif coretemp joydev input_leds binfmt_misc kvm_intel ipmi_si kvm dcdbas ipmi_devintf ipmi_msghandler intel_cstate mac_hid acpi_power_meter i7core_edac sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mgag200 i2c_algo_bit hid_generic gpio_ich drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas cec rc_core usbhid raid_class drm pata_acpi hid lpc_ich bnx2 scsi_transport_sas
[ 547.575407] CR2: ffffb73800000001
[ 547.577070] ---[ end trace 3ebb9a2cada35096 ]---
[ 547.586349] RIP: 0010:veth_xdp+0x18f/0x1e0 [veth]
[ 547.588039] Code: ff 41 89 9d 1c 01 00 00 49 21 85 e8 00 00 00 e9 74 ff ff ff 48 c7 c7 80 e3 b0 c0 e8 2b 3b 06 c1 b8 e4 ff ff ff 4d 85 ff 74 85 <49> c7 07 80 e3 b0 c0 e9 79 ff ff ff 48 c7 c7 20 e4 b0 c0 e8 09 3b
[ 547.591612] RSP: 0018:ffffb738c254f420 EFLAGS: 00010282
[ 547.593432] RAX: 00000000ffffffe4 RBX: 0000000000000db2 RCX: ffffb738c254fb20
[ 547.595282] RDX: ffffffffc0b0bf90 RSI: ffffb738c254f468 RDI: ffffffffc0b0e380
[ 547.597143] RBP: ffffb738c254f450 R08: 0000000000000001 R09: ffffb738c0081000
[ 547.599012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c65ced90000
[ 547.600888] R13: ffff8c65c12f6000 R14: 0000000000000000 R15: ffffb73800000001
[ 547.602764] FS: 00007faa028b3b80(0000) GS:ffff8c66f7640000(0000) knlGS:0000000000000000
[ 547.604675] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.606593] CR2: ffffb73800000001 CR3: 000000010f068000 CR4: 00000000000006e0

As this node was not tested with this test in previous cycle, it's yet to determine whether this is a regression or not.

Next is to test the kernel in -updates on this node.

CVE References

Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue can be reproduced with -1025 as well, so it's not a regression.

Looking back into test history, I can see onibi has been tested with 5.15.0-1015.20, but the test was built at that time:

 Running 'make run_tests -C net TEST_PROGS=veth.sh TEST_GEN_PROGS='' TEST_CUSTOM_PROGS='''
 make: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/net'
 make --no-builtin-rules ARCH=x86 -C ../../../.. headers_install
 make[1]: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux'
   INSTALL ./usr/include
 make[1]: Leaving directory '/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux'
 TAP version 13
 1..1
 # selftests: net: veth.sh
 # Missing xdp_dummy helper. Build bpf selftest first
 not ok 1 selftests: net: veth.sh # exit=1

summary: - net:veth.sh in ubuntu_kernel_selftests hang with J-intel-iotg
+ net:veth.sh in ubuntu_kernel_selftests hang with J-intel-iotg (BUG:
+ unable to handle page fault)
Revision history for this message
Philip Cox (philcox) wrote :

This is this LP bug: https://bugs.launchpad.net/bugs/1943697

This has been hinted on many of the jammy kernels. It is likely safe to mark this as a duplicate of https://bugs.launchpad.net/bugs/1943697 and use that to track the progress. I won't mark it as a duplicate in case someone wants to track this separately.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi Philip,
I don't think the veth bug 1943697 cause a system hang like this. And this bug can be triggered when you run the test directly (isolated from others), just not every time...

Revision history for this message
Philip Cox (philcox) wrote :

Hi Po-Hsu,

Thanks for the note. I will try to look over this some more tomorrow.

Stefan Bader (smb)
Changed in linux-intel-iotg (Ubuntu):
status: New → Invalid
Changed in linux-intel-iotg (Ubuntu Jammy):
importance: Undecided → Medium
status: New → In Progress
tags: added: lookout-canyon oem-priority originate-from-2011522
Changed in hwe-next:
status: New → Invalid
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Issue still visible in this cycle, confirmed with Jian Hui that this is expected.

tags: added: sru-20230227
tags: added: originate-from-1943687
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-intel-iotg/5.15.0-1028.33 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-intel-iotg verification-needed-jammy
tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (31.4 KiB)

This bug was fixed in the package linux-intel-iotg - 5.15.0-1028.33

---------------
linux-intel-iotg (5.15.0-1028.33) jammy; urgency=medium

  * jammy/linux-intel-iotg: 5.15.0-1028.33 -proposed tracker (LP: #2011906)

  * Fail to output sound to external monitor which connects via docking station
    (LP: #2009024)
    - [Config] Enable CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM

  * [adl-n][rpl] support intel power capping framework (LP: #2013032)
    - powercap: RAPL: Add Power Limit4 support for RaptorLake
    - powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
    - powercap: intel_rapl: add support for ALDERLAKE_N

  * [ADL-PS] Audio is malfunction (LP: #2012584)
    - ASoC: SOF: Intel: pci-tgl: add ADL-PS support
    - ASoC: Intel: soc-acpi: Add entry for rt711-sdca-sdw in ADL match table

  * net:veth.sh in ubuntu_kernel_selftests hang with J-intel-iotg (BUG: unable
    to handle page fault) (LP: #2008085)
    - Revert "rtnetlink: Add return value check"
    - Revert "rtnetlink: Fix unchecked return value of dev_xdp_query_md_btf()"
    - Revert "igc: Enable HW RX Timestamp for AF_XDP ZC"
    - Revert "igc: Add BTF based metadata for XDP"
    - Revert "net/core: XDP metadata BTF netlink API"

  * Bluetooth: btusb: Add module firmware information for MT7622 and MT7961
    (LP: #2011520)
    - SAUCE: (no-up) Bluetooth: btusb: Add module firmware information for MT7622
      and MT7961

  [ Ubuntu: 5.15.0-70.77 ]

  * jammy/linux: 5.15.0-70.77 -proposed tracker (LP: #2011918)
  * CVE-2023-26545
    - net: mpls: fix stale pointer if allocation fails during device rename
  * CVE-2023-1281
    - net/sched: tcindex: update imperfect hash filters respecting rcu
  * [SRU][Ubuntu 22.04.1] mpi3mr: Add management application interface(BSG)
    support (LP: #1971151)
    - scsi: mpi3mr: Add bsg device support
    - scsi: mpi3mr: Add support for driver commands
    - scsi: mpi3mr: Move data structures/definitions from MPI headers to uapi
      header
    - scsi: mpi3mr: Add support for MPT commands
    - scsi: mpi3mr: Add support for PEL commands
    - scsi: mpi3mr: Expose adapter state to sysfs
    - scsi: mpi3mr: Add support for NVMe passthrough
    - scsi: mpi3mr: Update driver version to 8.0.0.69.0
    - scsi: mpi3mr: Increase I/O timeout value to 60s
    - scsi: mpi3mr: Hidden drives not removed during soft reset
    - scsi: mpi3mr: Return I/Os to an unrecoverable HBA with DID_ERROR
    - scsi: mpi3mr: Fix a NULL vs IS_ERR() bug in mpi3mr_bsg_init()
    - scsi: mpi3mr: Return error if dma_alloc_coherent() fails
    - scsi: mpi3mr: Add shost related sysfs attributes
    - scsi: mpi3mr: Add target device related sysfs attributes
    - scsi: mpi3mr: Rework mrioc->bsg_device model to fix warnings
    - scsi: mpi3mr: Fix kernel-doc
  * cpufreq: intel_pstate: Update Balance performance EPP for Sapphire Rapids
    (LP: #2008519)
    - cpufreq: intel_pstate: Update EPP for AlderLake mobile
    - cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire Rapids
  * Fail to output sound to external monitor which connects via docking station
    (LP: #2009024)
    - [Config] Enable CONFIG_SND_HDA_INTEL_HDMI_SILENT_STREAM
  *...

Changed in linux-intel-iotg (Ubuntu Jammy):
status: In Progress → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-intel-iotg-5.15/5.15.0-1029.34~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-intel-iotg-5.15 verification-needed-focal
Revision history for this message
Jian Hui Lee (jianhuilee) wrote :

version 5.15.0-1030.35~20.04.1 passes the regression test.

tags: added: verification-done-focal
removed: verification-needed-focal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.