[Zesty] mlx5_core Kernel oops with bonding mode 1 and 6

Bug #1676786 reported by Talat Batheesh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Tim Gardner
Zesty
Fix Released
Undecided
Tim Gardner

Bug Description

We get kernel panic when we install a bond interface with two of Mellanox mlx5 NIC's and try to unload the bonding module.

scenario:
1. network interfaces configuration
# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

 # The loopback network interface
 auto lo
 iface lo inet loopback

 # The primary network interface
 auto eno1
 iface eno1 inet dhcp

 #ens1f0
 auto ens1f0
 iface ens1f0 inet manual
 bond-master bond1

 auto ens1f1
 iface ens1f1 inet manual
 bond-master bond1

 auto bond1
 iface bond1 inet static
 address 27.65.194.1
 netmask 255.255.255.0
 bond-slaves ens1f0 ens1f1
 bond-mode 1
 bond-primary ens1f0
 bond-miimon 100
 iface bond1 inet6 static
 address 907c:c828:4d05:5bf8:0000:0000:0000:0002/127

# cat /etc/modprobe.d/bonding.conf
options bonding mode=1

2. ifup bond1
3. modprobe -r bonding

4. OOPS
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.443796] Oops: 0000 [#1] SMP
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.444686] Modules linked in: mlx5_ib mlx5_core bonding mlx4_ib ib_core mlx4_en mlx4_core nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ipmi_ssif intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate joydev input_leds intel_rapl_perf serio_raw lpc_ich hpilo ipmi_si ioatdma ipmi_devintf dca ipmi_msghandler shpchp mac_hid acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.469445] x_tables autofs4 hid_generic psmouse usbhid hid pata_acpi tg3 hpsa ptp scsi_transport_sas devlink pps_core wmi fjes [last unloaded: mlx5_core]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.473672] CPU: 23 PID: 4846 Comm: ifenslave Not tainted 4.10.0-9-generic #11-Ubuntu
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.475894] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.478038] task: ffff9b8394e31680 task.stack: ffffb2ed054f4000
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.533408] RIP: 0010:mlx5_lag_netdev_event+0x1e6/0x230 [mlx5_core]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.590069] RSP: 0018:ffffb2ed054f7bd0 EFLAGS: 00010202
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.646302] RAX: 0000000000000002 RBX: ffff9b7f825f6000 RCX: 0000000000000000
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.701966] RDX: 0000000000000000 RSI: 0000000400000400 RDI: ffff9b7f840a00b0
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.756395] RBP: ffffb2ed054f7c18 R08: ffffffffc02fb000 R09: ffff9b7fa3117ea8
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.810250] R10: 0000000000000000 R11: 000000000051a84e R12: 0000000000000001
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.863569] R13: 0000000000000004 R14: ffff9b7fa3117ea8 R15: ffffffff8992b108
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 540.916725] FS: 00007fc6cca0e700(0000) GS:ffff9b83af0c0000(0000) knlGS:0000000000000000
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.020509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.072342] CR2: 0000000000000002 CR3: 0000000817013000 CR4: 00000000001406e0
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.127206] Call Trace:
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.180602] notifier_call_chain+0x4a/0x70
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.235310] raw_notifier_call_chain+0x16/0x20
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.287923] call_netdevice_notifiers_info+0x35/0x60
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.342951] netdev_upper_dev_unlink+0x72/0xb0
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.395322] bond_upper_dev_unlink.isra.42+0x18/0x40 [bonding]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.446520] __bond_release_one+0x170/0x550 [bonding]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.499303] ? netdev_info+0x6c/0x90
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.550677] bond_release+0x10/0x20 [bonding]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.602044] bond_option_slaves_set+0xe6/0x130 [bonding]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.653333] __bond_opt_set+0xe2/0x3a0 [bonding]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.703257] bond_opt_tryset_rtnl+0x56/0xa0 [bonding]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.751799] bonding_sysfs_store_option+0x35/0x70 [bonding]
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.799933] dev_attr_store+0x18/0x30
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.846332] sysfs_kf_write+0x37/0x40
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.890905] kernfs_fop_write+0x11d/0x1b0
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.935415] __vfs_write+0x18/0x40
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 541.976508] vfs_write+0xb5/0x1a0
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.018692] SyS_write+0x55/0xc0
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.056645] entry_SYSCALL_64_fastpath+0x1e/0xad
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.096526] RIP: 0033:0x7fc6cc52bd20
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.136659] RSP: 002b:00007ffc13c78d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.219432] RAX: ffffffffffffffda RBX: 00007fc6cc7f5b58 RCX: 00007fc6cc52bd20
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.262617] RDX: 0000000000000008 RSI: 000056541f282ea0 RDI: 0000000000000001
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.306153] RBP: 00007fc6cc7f5b00 R08: 00007fc6cc7f5c78 R09: 000056541f04b8a8
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.349657] R10: 000056541f282ea0 R11: 0000000000000246 R12: 00007fc6cc7f5b58
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.392097] R13: 0000000000002010 R14: 00007fc6cc7f5b58 R15: 000000000000270f
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.431871] Code: 39 be 68 ff ff ff 74 5b 4d 39 be 78 ff ff ff 74 48 83 45 c0 01 eb cb 8b 45 c4 85 c0 0f 84 42 ff ff ff 48 8b 45 b8 48 85 c0 74 03 <44> 8b 28 83 7d c0 02 75 21 83 7d c4 03 75 1b 41 8d 45 fc 83 f8
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.558999] RIP: mlx5_lag_netdev_event+0x1e6/0x230 [mlx5_core] RSP: ffffb2ed054f7bd0
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.645369] CR2: 0000000000000002
Mar 12 16:44:32 qa-h-vrt-038 kernel: [ 542.687127] ---[ end trace 92901adbd279c621 ]---
Mar 12 16:45:13 qa-h-vrt-038 systemd[1]: Reloading.

we already fix the issue and we are going to send it upstream, I will send it also to canonical kernel team mailing list.

Thanks,
Talat

CVE References

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1676786

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Talat Batheesh (talat-b87) wrote :

This commit fix the issue
commit e497ec680c4cd51e76bfcdd49363d9ab8d32a757
Author: Talat Batheesh <email address hidden>
Date: Tue Mar 28 16:13:41 2017 +0300

    net/mlx5: Avoid dereferencing uninitialized pointer

    In NETDEV_CHANGEUPPER event the upper_info field is valid
    only when linking is true. Otherwise it should be ignored.

    Fixes: 7907f23adc18 (net/mlx5: Implement RoCE LAG feature)
    Signed-off-by: Talat Batheesh <email address hidden>
    Reviewed-by: Aviv Heller <email address hidden>
    Reviewed-by: Moni Shoua <email address hidden>
    Signed-off-by: Saeed Mahameed <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Zesty):
assignee: nobody → Tim Gardner (timg-tpi)
status: Incomplete → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.0 KiB)

This bug was fixed in the package linux - 4.10.0-19.21

---------------
linux (4.10.0-19.21) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1680535

  * ADT regressions caused by "audit: fix auditd/kernel connection state
    tracking" (LP: #1680532)
    - SAUCE: Revert "audit: fix auditd/kernel connection state tracking"

  * Miscellaneous Ubuntu changes
    - [Config] updateconfigs to update CONFIG_GENERIC_CSUM for ppc64el
      This cleans up behind a Kconfig change that went undetected.

linux (4.10.0-18.20) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1680168

  * smartpqi driver needed in initram disk and installer (LP: #1680156)
    - UBUNU: [Config] Add smartpqi to d-i

linux (4.10.0-17.19) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1679718

  * Fix CVE-2017-7308 (LP: #1678009)
    - net/packet: fix overflow in check for priv area size
    - net/packet: fix overflow in check for tp_frame_nr
    - net/packet: fix overflow in check for tp_reserve

  * apparmor: oops on boot if parameters set on grub command line (LP: #1678048)
    - SAUCE: apparmor: fix parameters so that the permission test is bypassed at boot

  * apparmor: does not provide a way to detect policy updataes (LP: #1678032)
    - SAUCE: apparmor: add policy revision file interface

  * apparmor does not make support of query data visible (LP: #1678023)
    - SAUCE: apparmor: add label data availability to the feature set

  * apparmor query interface does not make supported query info available
    (LP: #1678030)
    - SAUCE: apparmor: add information about the query inteface to the feature set

  * change_profile incorrect when using namespaces with a compound stack
    (LP: #1677959)
    - SAUCE: apparmor: fix label parse for stacked labels

  * Zesty update to v4.10.8 stable release (LP: #1678930)
    - xfrm: policy: init locks early
    - xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
    - xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
    - KVM: nVMX: Fix nested VPID vmx exec control
    - KVM: x86: cleanup the page tracking SRCU instance
    - virtio_balloon: init 1st buffer in stats vq
    - pinctrl: qcom: Don't clear status bit on irq_unmask
    - c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
    - h8300/ptrace: Fix incorrect register transfer count
    - mips/ptrace: Preserve previous registers for short regset write
    - sparc/ptrace: Preserve previous registers for short regset write
    - metag/ptrace: Preserve previous registers for short regset write
    - metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
    - metag/ptrace: Reject partial NT_METAG_RPIPE writes
    - qla2xxx: Allow vref count to timeout on vport delete.
    - sched/rt: Add a missing rescheduling point
    - usb: musb: fix possible spinlock deadlock
    - Linux 4.10.8

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs
    - PCI: hv: Use device serial number as PCI domain

  * Miscellaneous Ubuntu changes
    - [Config] flash-kernel should be a...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.