Comment 18 for bug 1824687

Revision history for this message
Heikki Hannikainen (hessu) wrote :

I have had this crash, with the ip6_expire_frag_queue stack trace, more than 18 times since 2019-04-16 on more than 10 different servers in 8 different countries. There have been some more crashes, but from these ones the panic dump managed to go out to a remote syslog server where it's easy to grep. Crash count by kernel version; these are on both trusty and xenial:

2 crashes: 4.4.0-144-generic #170~14.04.1-Ubuntu
8 crashes: 4.4.0-145-generic #171-Ubuntu
8 crashes: 4.4.0-146-generic #172-Ubuntu

Downgrading to 4.4.0-143 now, as that build does not seem to have the "ipv6: frags: rewrite ip6_expire_frag_queue()" change; it first appears in 4.4.0-144-generic image. I think by tomorrow it's clear whether that kernel is stable as we're now having multiple crashes per day (last crash 50 minutes ago).

These are routers running NAT & firewall & some applications, with substantial IPv6 traffic.

Interestingly the crashes only happen on bare hardware. We have a much
larger number of VMs doing the same thing, most of them now running
4.4.0-146, and none of them have crashed like this. The hardware instances
do have a larger number of CPU cores, the VMs only have 2 or 4.

I am also seeing crashes on 4.15.0-48-generic hwe kernel running on xenial,
but no stack trace to show yet.

Attaching kernel stack trace file containing several crashes on various servers (hessu-ipv6_expire_frag_queue-crashes.txt).