Comment 15 for bug 1970074

Revision history for this message
Juan Jesús García de Soria (skandalfo) wrote :

Got this too in my Mini ITX board with 12 SATA ports (in addition to another chipset SATA port):

[ 7.633320] ================================================================================
[ 7.633467] UBSAN: array-index-out-of-bounds in /build/linux-WLUive/linux-5.15.0/drivers/ata/libahci.c:968:41
[ 7.633632] index 15 is out of range for type 'ahci_em_priv [8]'
[ 7.633733] CPU: 1 PID: 226 Comm: scsi_eh_9 Not tainted 5.15.0-40-generic #43-Ubuntu
[ 7.633741] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 12/24/2018
[ 7.633745] Call Trace:
[ 7.633750] <TASK>
[ 7.633755] show_stack+0x52/0x58
[ 7.633767] dump_stack_lvl+0x4a/0x5f
[ 7.633779] dump_stack+0x10/0x12
[ 7.633787] ubsan_epilogue+0x9/0x45
[ 7.633795] __ubsan_handle_out_of_bounds.cold+0x44/0x49
[ 7.633804] ahci_qc_issue+0x166/0x170 [libahci]
[ 7.633820] ata_qc_issue+0x135/0x240
[ 7.633829] ata_exec_internal_sg+0x2c4/0x580
[ 7.633837] ? vprintk_default+0x1d/0x20
[ 7.633847] ata_exec_internal+0x67/0xa0
[ 7.633856] sata_pmp_read+0x8d/0xc0
[ 7.633865] sata_pmp_read_gscr+0x3c/0x90
[ 7.633873] sata_pmp_attach+0x8b/0x310
[ 7.633882] ata_eh_revalidate_and_attach+0x28c/0x4b0
[ 7.633890] ata_eh_recover+0x6b6/0xb30
[ 7.633897] ? ahci_do_hardreset+0x180/0x180 [libahci]
[ 7.633910] ? ahci_stop_engine+0xb0/0xb0 [libahci]
[ 7.633921] ? ahci_do_softreset+0x290/0x290 [libahci]
[ 7.633933] ? trace_event_raw_event_ata_eh_link_autopsy_qc+0xe0/0xe0
[ 7.633944] sata_pmp_eh_recover.isra.0+0x214/0x560
[ 7.633952] ? asm_sysvec_call_function_single+0x12/0x20
[ 7.633964] sata_pmp_error_handler+0x23/0x40
[ 7.633972] ahci_error_handler+0x43/0x80 [libahci]
[ 7.633985] ata_scsi_port_error_handler+0x2b1/0x600
[ 7.633993] ata_scsi_error+0x9c/0xd0
[ 7.634000] scsi_error_handler+0xa1/0x180
[ 7.634007] ? scsi_unjam_host+0x1c0/0x1c0
[ 7.634014] kthread+0x12a/0x150
[ 7.634023] ? set_kthread_struct+0x50/0x50
[ 7.634031] ret_from_fork+0x22/0x30
[ 7.634043] </TASK>
[ 7.634045] ================================================================================

The ones complaining are apparently behind a SATA Port Multiplier (hence the "pmp" part in the traces).

Here's my lspci output:

00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx Series SoC Transaction Register (rev 11)
00:02.0 VGA compatible controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display (rev 11)
00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series SATA AHCI Controller (rev 11)
00:14.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 11)
00:1a.0 Encryption controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Trusted Execution Engine (rev 11)
00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx Series High Definition Audio Controller (rev 11)
00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 1 (rev 11)
00:1c.1 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 2 (rev 11)
00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 3 (rev 11)
00:1c.3 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 4 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Power Control Unit (rev 11)
00:1f.3 SMBus: Intel Corporation Atom Processor E3800/CE2700 Series SMBus Controller (rev 11)
01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11)
02:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11)
03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
04:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)

Note that index "15" is the "address" the port multiplier itself takes in a SATA link participationg in a port multiplier setup

From my dmesg, directly following this:

[ 7.635154] ata10.15: Port Multiplier 1.2, 0x1b4b:0x9705 r160, 5 ports, feat 0x1/0x1f
[ 7.638227] ahci 0000:02:00.0: FBS is enabled
[ 7.639009] ata10.00: hard resetting link
[ 7.954863] ata10.00: SATA link down (SStatus 0 SControl 330)
[ 7.954951] ata10.01: hard resetting link
[ 8.266940] ata10.01: SATA link down (SStatus 0 SControl 330)
[ 8.267159] ata10.02: hard resetting link
[ 8.582416] ata10.02: SATA link down (SStatus 0 SControl 330)
[ 8.582508] ata10.03: hard resetting link
[ 8.897169] ata10.03: SATA link down (SStatus 0 SControl 330)
[ 8.897261] ata10.04: hard resetting link
[ 9.214027] ata10.04: SATA link down (SStatus 0 SControl 330)
[ 9.214186] ata10: EH complete

I don't have anything connected to the port multiplier sublinks on ata10 (ata10.00 - ata10.04). Given the Vendor/Product IDs reported, it'd be logical my Port Multiplier is a Marvell 88SM9705.

So if the other reports are about other brands of port multipliers this doesn't look like a device-specific issue.

One possible thing is that PMP support had always something broken (an "innocent" read past an array size) and Ubuntu enabling UBSAN made this a warning. The declared affected range of kernel versions seems consistent with Canonical enabling UBSAN around 5.14:

https://lists.ubuntu.com/archives/kernel-team/2021-September/123830.html

In my affected system (on 5.15):

$ cat /boot/config-$(uname -r) | grep UBSAN=
CONFIG_UBSAN=y

Which would be obvious as it's an UBSAN error that's reported.

@marmai checked that kernel 5.8 doesn't have this problem; it might make sense to check that UBSAN isn't enabled in that kernel.