gcc-10 breaks on armhf (flaky): internal compiler error: Segmentation fault

Bug #1890435 reported by Christian Ehrhardt 
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gcc
In Progress
Medium
gcc-10 (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Hi,
this could be the same as bug 1887557 but as I don't have enough data I'm filing it as individual issue for now.

I have only seen this happening on armhf so far.
In 2 of 5 groovy builds of qemu 5.0 this week I have hit the issue, but it is flaky.

Flakyness:
1. different file
first occurrence
/<<PKGBUILDDIR>>/target/s390x/excp_helper.c:544:1: internal compiler error: Segmentation fault
second occurrence
/<<PKGBUILDDIR>>/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault

Being so unreliable I can't provide mcuh more yet.
I filed it mostly for awareness and so that I can be dup'ed onto the right but if there is a better one.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've today seen this on DPDK
https://launchpadlibrarian.net/497142982/buildlog_ubuntu-groovy-armhf.dpdk_20.08-1ubuntu1~ppa1_BUILDING.txt.gz

And recently also on qemu again (but that was in the main archive and I could not hold back hitting retry on which it worked).

Is there anything in the pipeline that could address this and makes it worth running a few re-compiles?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There was another one in Groovy as of yesterday.
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4263/+packages
https://launchpadlibrarian.net/497497840/buildlog_ubuntu-groovy-armhf.qemu_1%3A5.0-5ubuntu8~ppa1_BUILDING.txt.gz

...
qapi/qapi-visit-block-core.c: In function ‘visit_type_q_obj_BlockdevOptions_base_members’:
qapi/qapi-visit-block-core.c:6570:1: internal compiler error: Segmentation fault
 6570 | }
      | ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
...
The bug is not reproducible, so it is likely a hardware or OS problem.

So the compiler itself is recognizing that it isn't the source code (alone) but some awkwardness that is flaky.

It seems qemu builds in groovy hit this in ~1/3 of the builds we do on armhf - not sure if that is enough for debugging for you?

Revision history for this message
Matthias Klose (doko) wrote :

no, try a local build until you have a reproducer. When DEB_BUILD_OPTIONS is set, the compiler driver retries up to three times to see if it's reproducible.

description: updated
Revision history for this message
Balint Reczey (rbalint) wrote :

Found it again in glibc 2.32-0ubuntu3 build.

vfscanf-internal.c: In function ‘__vfscanf_internal’:
vfscanf-internal.c:3057:1: internal compiler error: Segmentation fault
 3057 | }
      | ^

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'm building qemu (known to be able to trigger it) on Canonistack armhf LXD container in an arm64 VM (the setup that should be closest to the failing builders).
I also installed whoopsie and apport to catch even a single crash.

But I'm building for quite some hours by now and nothing happened.

I'll let it run the rest of the day in a a loop, but if it won't trigger again we need a better approach trying to corner this bug.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I compiled for almost 24h now, it just won't crash :-/
Not sure what else I could do to more likely reproduce this ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Another breakage at
https://launchpad.net/ubuntu/+source/qemu/1:5.0-5ubuntu9/+build/19958575
I had to retry it, we will see if it works on retry as before

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

And again on the same :-/

cc -iquote /<<PKGBUILDDIR>>/b/qemu/linux-user/s390x -iquote linux-user/s390x -iquote /<<PKGBUILDDIR>>/tcg/arm -isystem /<<PKGBUILDDIR>>/linux-headers -isystem /<<PKGBUILDDIR>>/b/qemu/linux-headers -iquote . -iquote /<<PKGBUILDDIR>> -iquote /<<PKGBUILDDIR>>/accel/tcg -iquote /<<PKGBUILDDIR>>/include -iquote /<<PKGBUILDDIR>>/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/<<PKGBUILDDIR>>/capstone/include -isystem ../linux-headers -iquote .. -iquote /<<PKGBUILDDIR>>/target/s390x -DNEED_CPU_H -iquote /<<PKGBUILDDIR>>/include -I/<<PKGBUILDDIR>>/linux-user/s390x -I/<<PKGBUILDDIR>>/linux-user/host/arm -I/<<PKGBUILDDIR>>/linux-user -Ilinux-user/s390x -MMD -MP -MT linux-user/s390x/signal.o -MF linux-user/s390x/signal.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o linux-user/s390x/signal.o /<<PKGBUILDDIR>>/linux-user/s390x/signal.c
The bug is not reproducible, so it is likely a hardware or OS problem.

There seems to be no pattern to it (e.g. on which source file it break), just a chance that increased probably on source size. But I wonder what else I could do on top of the canonistack build that I have tried - maybe concurrency?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

cc -iquote /<<PKGBUILDDIR>>/b/qemu/accel/tcg -iquote accel/tcg -iquote /<<PKGBUILDDIR>>/tcg/arm -isystem /<<PKGBUILDDIR>>/linux-headers -isystem /<<PKGBUILDDIR>>/b/qemu/linux-headers -iquote . -iquote /<<PKGBUILDDIR>> -iquote /<<PKGBUILDDIR>>/accel/tcg -iquote /<<PKGBUILDDIR>>/include -iquote /<<PKGBUILDDIR>>/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/<<PKGBUILDDIR>>/capstone/include -isystem ../linux-headers -iquote .. -iquote /<<PKGBUILDDIR>>/target/lm32 -DNEED_CPU_H -iquote /<<PKGBUILDDIR>>/include -MMD -MP -MT accel/tcg/translate-all.o -MF accel/tcg/translate-all.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o accel/tcg/translate-all.o /<<PKGBUILDDIR>>/accel/tcg/translate-all.c
during RTL pass: reload
/<<PKGBUILDDIR>>/tcg/tcg-op-gvec.c: In function ‘tcg_gen_gvec_shlv’:
/<<PKGBUILDDIR>>/tcg/tcg-op-gvec.c:2936:1: internal compiler error: Segmentation fault
 2936 | }
      | ^
Please submit a full bug report,
with preprocessed source if appropriate.

Now hit at 3/3 retries which is exactly what we were afraid of might happen ...

Changed in gcc-10 (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Bumping the prio since -as we were afraid of - this starts to become a service-problem (what if we can't rebuild anymore?)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I reduced the CPU/Mem of my canonistack system that I try to recreate on (to be more similar).
Also I now do run with DEB_BUILD_OPTIONS=parallel=4 as the real build.
/me hopes this might help to finally trigger it in a debuggable environment.

P.S: I'm now at 4/4 retries that failed for the real build ... :-/ It gladly worked on the fifth retry
P.P.S: Note to myself 4cpu/8G Memory is the real size used (I have 4/4 atm since I set it up before I could reach anyone)

Revision history for this message
Seth Forshee (sforshee) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I got the crash in the repro env.
dmesg holds no OOM which is good - also no other dmesg/journal entry that would be related.

It might be depending on concurrent execution as this was the primary change to last time.
And not having set up apport/whoopsie to catch the crash :-/
I've installed them now and run the formerly breaking command in a loop.

For the sake of "just eating cpu cycles" I have spawned some cpu hogs in the background.
But with all that in place it ran the compile 300 times without a crash :-/

It seems I have to re-run in the build env and hope that apport will catch it into /var/crash this time :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Finally:
cc -iquote /root/qemu-5.0/b/user-static/linux-user -iquote linux-user -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/user-static/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -DLEGACY_RDMA_REG_MR -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/arm -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/aarch64 -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/aarch64 -MMD -MP -MT linux-user/syscall.o -MF linux-user/syscall.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o linux-user/syscall.o /root/qemu-5.0/linux-user/syscall.c
during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^
...
The bug is not reproducible, so it is likely a hardware or OS problem.
make[2]: *** [/root/qemu-5.0/rules.mak:69: linux-user/syscall.o] Error 1
make[2]: Leaving directory '/root/qemu-5.0/b/user-static/i386-linux-user'
make[1]: *** [Makefile:527: i386-linux-user/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Still nothing in /var/crash to report :-/
Why is that - I have apport/whoopsie installed, the kernel is set up
$ sysctl -a | grep core_patt
  kernel.core_pattern = |/usr/share/apport/apport %p %s %c %d %P %E
Also I have set
$ cat ~/.config/apport/settings
[main]
unpackaged=true

This is armhf lxd on arm64 host - maybe apport has a guest/host problem here?

@Doko - do you happen to know if there are any extra whoops to jump through to get a crash report from gcc when it crashes in debuild?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, apport through the stack of LXD is ... not working.
I have used a more mundane core pattern and a C test program to ensure I will get crash dumps.

$ cat /proc/sys/kernel/core_pattern
/var/crash/core.%e.%p.%h.%t

$ gcc test.c ; ./a.out ; ll /var/crash/
Segmentation fault (core dumped)
total 3
drwxrwsrwt 2 root whoopsie 3 Sep 24 05:48 ./
drwxr-xr-x 13 root root 15 Sep 23 09:40 ../
-rw------- 1 root whoopsie 208896 Sep 24 05:48 core.a.out.189131.groovy-gccfail.1600926486

Trying to run into the real gcc crash again with this ensured ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Three reruns later I got

cc -iquote /root/qemu-5.0/b/qemu/accel/tcg -iquote accel/tcg -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/ppc -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/ppc -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/ppc -MMD -MP -MT accel/tcg/tcg-runtime-gvec.o -MF accel/tcg/tcg-runtime-gvec.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o accel/tcg/tcg-runtime-gvec.o /root/qemu-5.0/accel/tcg/tcg-runtime-gvec.c
during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^
Please submit a full bug report,
with preprocessed source if appropriate.
The bug is not reproducible, so it is likely a hardware or OS problem.

Again no crash of gcc to find, how it is disabling that ... ?!?

I was reading through /usr/share/doc/gcc-10/README.Bugs which gets mentioned in the error messages but there isn't a better hint either.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interim Summary:
- hits armhf compiles of various large source projects, chances are it it completely random
  and just hits those more likely by compiling more
- build system auto-retries the compiles and they work on retry eventually reported as "The bug
  is not reproducible, so it is likely a hardware or OS problem."
- The bug always occurs on different source files, retrying a failed one works for hundreds of
  times so it seems to be sort of random when it hits and not tied to the source.
- It seems we need concurrency to trigger it, but again it might just have increases the
  likeliness
- I can trigger it reliably now in ~2-8h of compile time on Canonistack when building qemu
  on an armhf LXD container on a arm64 Hosts (same as the builders)
- Despite my tries I'm unable to gather a crash dump of the gcc segfault and would be happy
  about a hint/advise on that.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Not sure if it is entirely random, it hit the second time on
  /<<PKGBUILDDIR>>/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
in like 2/8 hits I've had so far. Given how much code it builds that is unlikely to be an accident.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I tried to isolate what was running concurrently and found 7 gcc calls.
I have set them up to run concurrently in endless loops each.
That way they reached a lot of iterations without triggering the issue :-/

I don't know how to continue :-/
But I can share a login to this system and show how to trigger the bug.

The following will get you there and trigger the bug usually in 1-2 loops (~4h on average)
$ ssh ubuntu@10.48.130.69
$ lxc exec groovy-gccfail bash
# cd qemu-5.0/
# i=1; export DEB_BUILD_OPTIONS=parallel=4; while debuild -i -us -uc -b -d; do echo "try $((i++)) complete" >> ~/build.log; done

@Doko could you take over from here as I'd hope you know how to force gcc to give you a dump?
I imported your key to the system mentioned above.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It was brought up with foundations last week in our sync and mentioned that someone will look into it for further guidance on the case. Since nothing happened I'll add the rls-gg-incoming tag to make sure it is re-visited in your bug meetings.

I beg you pardon, i know it is your tag and please feel free to remove it if it really is incorrect here - but I just want (more or less any) a response on this from someone able to decide if this is actually critical (or not) and how to go on.

tags: added: rls-gg-incoming
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There is a new gcc-10 version from two days ago in groovy now.
I was talking with doko and we wanted to try different gcc-10 versions in general trying to corner the issue to when it started to appear.

https://launchpad.net/ubuntu/+source/gcc-10/10-20200425-1ubuntu2 - WIP
https://launchpad.net/ubuntu/+source/gcc-10/10.1.0-6ubuntu1 - WIP
https://launchpad.net/ubuntu/+source/gcc-10/10.2.0-9ubuntu2 - fails
https://launchpad.net/ubuntu/+source/gcc-10/10.2.0-11ubuntu1 - WIP

I usually had the crash in 1-2 runs, so I will consider 4 good runs as the issue being not present. Although there is some racyness to this I just can't wait much longer without growing out of a day for a single test :-/

I'll update once the I got more results

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Downloaded the other two as well and running on https://launchpad.net/ubuntu/+source/gcc-10/10.1.0-6ubuntu1 now

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: This passed two runs good by now, but that isn't enough. I need to have it running over night to be sure about 10.1

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

https://launchpad.net/ubuntu/+source/gcc-9/9.3.0-18ubuntu1 ran 6 complete runs over night and can be considered good.

So the breakage was between 9.3.0-18ubuntu1 and 10-20200425-1ubuntu2

How to continue from here, will you throw me PPA builds and/or do you still have debs anywhere I should try?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Trying gcc-snapshot 1:20200917-1ubuntu1 now

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

gcc-snapshot 1:20200917-1ubuntu1 fails in other places.

/root/qemu-5.0/linux-user/m68k/signal.c:44:1: internal compiler error: 'verify_type' failed
0xf0afc3 internal_error(char const*, ...)
 ???:0
0x8fa705 verify_type(tree_node const*)
 ???:0
0x5f644b rest_of_type_compilation(tree_node*, int)
 ???:0
0x1f61c7 finish_struct(unsigned int, tree_node*, tree_node*, tree_node*, c_struct_parse_info*)
 ???:0
0x246ef9 c_parser_declspecs(c_parser*, c_declspecs*, bool, bool, bool, bool, bool, bool, bool, c_lookahead_kind)
 ???:0
0x254d81 c_parse_file()
 ???:0
0x2a3305 c_common_parse_file()
 ???:0

So gcc-snapshot is no good to try this :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Doko passed me gcc-10 - 10.2.0-14ubuntu0.1 from https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test/+packages.
Still building on armhf, but I'll give those a try once complete.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

As expected the non-strip removed the dbgsym:

The following packages will be REMOVED:
  gcc-10-dbgsym
The following packages will be upgraded:
  cpp-10 g++-10 gcc-10 gcc-10-base gcc-10-multilib libasan6 libatomic1 libcc1-0 libgcc-10-dev libgcc-s1 libgomp1 libsfasan6 libsfatomic1 libsfgcc-10-dev libsfgcc-s1 libsfgomp1 libsfubsan1
  libstdc++-10-dev libstdc++-10-pic libstdc++6 libubsan1
21 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.

This is now running and likely to crash later today.
But since I fail to get a crash dump before that (how to get one) will be the remaining issue we need to solve.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With this build the crash does still not leave a .crash file, but it is more verbose

cc -iquote /root/qemu-5.0/b/user-static/linux-user -iquote linux-user -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/user-static/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -DLEGACY_RDMA_REG_MR -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/arm -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/aarch64 -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/aarch64 -MMD -MP -MT linux-user/syscall.o -MF linux-user/syscall.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o linux-user/syscall.o /root/qemu-5.0/linux-user/syscall.c
during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^
0x532d6b crash_signal
 ../../src/gcc/toplev.c:328
0x523a5b avoid_constant_pool_reference(rtx_def*)
 ../../src/gcc/simplify-rtx.c:237
0x4f6f9d commutative_operand_precedence(rtx_def*)
 ../../src/gcc/rtlanal.c:3482
0x4f705b swap_commutative_operands_p(rtx_def*, rtx_def*)
 ../../src/gcc/rtlanal.c:3543
0x51deb3 simplify_binary_operation(rtx_code, machine_mode, rtx_def*, rtx_def*)
 ../../src/gcc/simplify-rtx.c:2333
0x51df01 simplify_gen_binary(rtx_code, machine_mode, rtx_def*, rtx_def*)
 ../../src/gcc/simplify-rtx.c:189
0x42c191 lra_constraints(bool)
 ../../src/gcc/lra-constraints.c:4966
0x41f483 lra(_IO_FILE*)
 ../../src/gcc/lra.c:2443
0x3f0915 do_reload
 ../../src/gcc/ira.c:5527
0x3f0915 execute
 ../../src/gcc/ira.c:5713

Does this help you in any way?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'll re-run and dump a few of them just to help you to get to the root cause:

cc -iquote /root/qemu-5.0/b/qemu/block -iquote block -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -I/root/qemu-5.0/tests -I/root/qemu-5.0/tests/qtest -MMD -MP -MT block/qcow2-snapshot.o -MF block/qcow2-snapshot.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o block/qcow2-snapshot.o /root/qemu-5.0/block/qcow2-snapshot.c
0x532d6b crash_signal
 ../../src/gcc/toplev.c:328
0x41d0c7 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1515
0x41d1c9 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1537
0x41d1c9 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1537
0x41e28f lra_update_insn_regno_info(rtx_insn*)
 ../../src/gcc/lra.c:1630
0x41e3d5 lra_update_insn_regno_info(rtx_insn*)
 ../../src/gcc/lra.c:1623
0x41e3d5 lra_push_insn_1
 ../../src/gcc/lra.c:1780
0x436bb5 spill_pseudos
 ../../src/gcc/lra-spills.c:542
0x436bb5 lra_spill()
 ../../src/gcc/lra-spills.c:655
0x41f4ef lra(_IO_FILE*)
 ../../src/gcc/lra.c:2560
0x3f0915 do_reload
 ../../src/gcc/ira.c:5527
0x3f0915 execute
 ../../src/gcc/ira.c:5713

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

cc -iquote /root/qemu-5.0/b/qemu/accel/stubs -iquote accel/stubs -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/xtensa -DNEED_CPU_H -iquote /root/qemu-5.0/include -MMD -MP -MT accel/stubs/hax-stub.o -MF accel/stubs/hax-stub.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o accel/stubs/hax-stub.o /root/qemu-5.0/accel/stubs/hax-stub.c
0x532d6b crash_signal
 ../../src/gcc/toplev.c:328
0x71769f thumb2_legitimate_address_p
 ../../src/gcc/config/arm/arm.c:8500
0x717c15 arm_legitimate_address_p(machine_mode, rtx_def*, bool)
 ../../src/gcc/config/arm/arm.c:8917
0x717c15 arm_legitimate_address_p(machine_mode, rtx_def*, bool)
 ../../src/gcc/config/arm/arm.c:8912
0x427eef valid_address_p
 ../../src/gcc/lra-constraints.c:331
0x427eef simplify_operand_subreg
 ../../src/gcc/lra-constraints.c:1514
0x4287ed curr_insn_transform
 ../../src/gcc/lra-constraints.c:3946
0x42c133 lra_constraints(bool)
 ../../src/gcc/lra-constraints.c:5031
0x41f483 lra(_IO_FILE*)
 ../../src/gcc/lra.c:2443
0x3f0915 do_reload
 ../../src/gcc/ira.c:5527
0x3f0915 execute
 ../../src/gcc/ira.c:5713

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

gcc-snapshot still has various issues - but not the crash

/root/qemu-5.0/linux-user/m68k/signal.c:44:1: error: 'TYPE_CANONICAL' is not compatible
   44 | };
      | ^
...
/root/qemu-5.0/linux-user/m68k/signal.c:44:1: internal compiler error: 'verify_type' failed

Can't continue with gcc-snapshot due to those (even with the newer version).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Defaults:
# gcc -Q --help=target | grep -e '-marm' -e '-mthumb'
  -marm [disabled]
  -mthumb [enabled]
  -mthumb-interwork [enabled]

Doko suggested to change that by using -marm.
This is running since a while, but needs some more time to trigger ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Doko - I can confirm that with -marm the issue is gone.
I have had 6 full runs yesterday and overnight.

We can conclude, -mthumb is a requirement to trigger the issue.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.7 KiB)

I spoke too soon after ~7.5 runs I got the following with -marm:

cc -iquote /root/qemu-5.0/b/user-static/target/arm -iquote target/arm -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/user-static/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -marm -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -DLEGACY_RDMA_REG_MR -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/arm -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/aarch64 -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/aarch64 -MMD -MP -MT target/arm/helper.o -MF target/arm/helper.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o target/arm/helper.o /root/qemu-5.0/target/arm/helper.c
during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall’:
/root/qemu-5.0/linux-user/syscall.c:12519:1: internal compiler error: Segmentation fault
12519 | }
      | ^
cc -iquote /root/qemu-5.0/b/user-static/target/arm -iquote target/arm -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/user-static/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -marm -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -DLEGACY_RDMA_REG_MR -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/arm -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/aarch64 -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/aarch64 -MMD -MP -MT target/arm/translate-sve.o -MF target/arm/translate-sve.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_S...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI now Testing 10.2.0-14ubuntu0.2 from https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test/+sourcepub/11647665/+listing-archive-extra

I've stopped setting -marm to trigger the issue "faster", please let me know if you want me to continue to use -marm for those tests.

Changed in groovy:
importance: Unknown → Medium
status: Unknown → New
Changed in gcc-10 (Ubuntu):
status: New → Confirmed
Changed in groovy:
status: New → Confirmed
48 comments hidden view all 128 comments
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Failed on #17
during RTL pass: reload

/root/qemu-5.0/fpu/softfloat.c: In function ‘soft_f64_muladd’:
/root/qemu-5.0/fpu/softfloat.c:1535:1: internal compiler error: Segmentation fault
 1535 | }
      | ^
cc -iquote /root/qemu-5.0/b/qemu/target/mips -iquote target/mips -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/mips -DNEED_CPU_H -iquote /root/qemu-5.0/include -MMD -MP -MT target/mips/helper.o -MF target/mips/helper.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o target/mips/helper.o /root/qemu-5.0/target/mips/helper.c
0x527c2f crash_signal
 ../../gcc/gcc/toplev.c:328
0x4147bf add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1509
0x4148b3 add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1531
0x4148b3 add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1531
0x4158d9 lra_update_insn_regno_info(rtx_insn*)
 ../../gcc/gcc/lra.c:1624
0x415a29 lra_update_insn_regno_info(rtx_insn*)
 ../../gcc/gcc/lra.c:1617
0x415a29 lra_push_insn_1
 ../../gcc/gcc/lra.c:1774
0x42dd53 spill_pseudos
 ../../gcc/gcc/lra-spills.c:523
0x42dd53 lra_spill()
 ../../gcc/gcc/lra-spills.c:636
0x416b1b lra(_IO_FILE*)
 ../../gcc/gcc/lra.c:2554
0x3e84d1 do_reload
 ../../gcc/gcc/ira.c:5523
0x3e84d1 execute
 ../../gcc/gcc/ira.c:5709

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'm not yet sure what we should learn from that - do we need 30 runs of each step to be somewhat sure? That makes an already slow bisect even slower ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - another 8 runs without a crash on r10-7093.
My current working theory is that the root cause of the crash might have been added as early as r10-4054 but one or many later changes have increased the chance (think increase the race window or such) for the issue to trigger.
If that assumption is true and with the current testcase it is nearly impossible to properly bisect the "original root cause". And at the same time still hard to find the one that increased the race window - since crashing early does not surely imply we are in the high/low chance area.

We've had many runs with the base versions so that one is really good.
But any other good result we've had so far could - in theory - be challenged and needs ~30 good runs to be somewhat sure (puh that will be a lot of time).

I'm marking the old runs that are debatable with good?<count-of-good-runs>.

Also we might want to look for just the "new" crash signature.

20190425 good
r10-1014
r10-2027 good?4
r10-2533
r10-3040 good?4
r10-3220
r10-3400 good?4
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good?5
r10-3727 good?3
r10-4054 other kind of bad - signature different, and rare?
r10-6080 good?10
r10-7093 bad, but slow to trigger
20200507 bad bad bad

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Of course it could be that the same root cause surfaces as two different signatures - but to it could as well be a multitude of issues. Therefore - for now - "add_regs_to_insn_regno_info (lra)" is what I'll continue to hunt for.

With some luck (do we have any in this?) the 10 runs on 6080 are sufficient.
Let us try r10-6586 next and plan for 15-30 runs to be sure it is good.
If hitting the issue I'll still re-run it so we can compare multiple signatures.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Since this seems to become a reproducibility-fest I've spawned and prepared two more workers using the same setup as the one we used before. That should allow for some more runs/days to increase the rate at we can process it - given the new insight to its unreliability.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6586 - passed 27 good runs, no fails

Updated Result Overview:
20190425 good
r10-1014
r10-2027 good?4
r10-2533
r10-3040 good?4
r10-3220
r10-3400 good?4
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good?5
r10-3727 good?3
r10-4054 other kind of bad - signature different, and rare?
r10-6080 good?10
r10-6586 good?27
r10-7093 bad, but slow to trigger (2 of 19)
20200507 bad bad bad

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Next I'll run r10-7093 in this new setup.
@Doko - It would be great to have ~6760 be built for the likely next step.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Add another 1/3 fails to r10-7093

Now I am on the next two
- r10-6760
- r10-6839

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

2/7 runs of r10-6839 failed with
r10-6839 add_regs_to_insn_regno_info (lra)

Next will be r10-6760

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Updated Result Overview:
20190425 good
r10-1014
r10-2027 good?4
r10-2533
r10-3040 good?4
r10-3220
r10-3400 good?4
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good?5
r10-3727 good?3
r10-4054 other kind of bad - signature different, and rare?
r10-6080 good?10
r10-6586 good?27
r10-6760 next
r10-6839 bad (2 of 9)
r10-7093 bad, but slow to trigger (2 of 19)
20200507 bad bad bad

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We'll need more runs to be sure, but so far r10-6760 seems good.
In preparation - could I requests builds between r10-6760 - r10-6839 please ?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, r10-6760 reached 20 good runs and is considered good.
Doko was so kind to build 6779 6799 6819 for me - of which 6799 will be next.

Note: I've aligned the comments to all have the same style and dropped the untested revisions.

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6779 untested
r10-6799 next
r10-6819 untested
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: r10-6799 had 14 good runs so far, I'll let it run for a bit longer to be sure.
Then - later today - if nothing changes r10-6819 will be next.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Completed 20 good runs on r10-6799, continuing with r10-6819 as planned.

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 next
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6819 had 22 good runs.
r10-6829 will be the next to try.

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6829 next
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6829 has 2 fails in 35 runs
Signature matches, both are: add_regs_to_insn_regno_info (lra)
r10-6824 = next

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6824 next
r10-6829 bad 2 of 35
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6829 add_regs_to_insn_regno_info (lra)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6824 bad 1 of 24, signature matches
We have only a few steps to go and need to increase the number of runs to be sure, so I'll let it run for a while longer.
Also - eventually - I'll re-run what we consider to be the last good, quite a few times to be sure.

Most likely I'll later today switch and test r10-6822 next.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6822 next
r10-6824 bad 1 of 33
r10-6829 bad 2 of 35
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6824 add_regs_to_insn_regno_info (lra)
r10-6829 add_regs_to_insn_regno_info (lra)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6822 so far has 0 of 20, but I'll let it run another ~24h

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6822 seems good.

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6822 good 0 of 37
r10-6823 next
r10-6824 bad 1 of 33
r10-6829 bad 2 of 35
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6824 add_regs_to_insn_regno_info (lra)
r10-6829 add_regs_to_insn_regno_info (lra)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6823 bad 1 of 28

during RTL pass: reload
/root/qemu-5.0/migration/ram.c: In function ‘ram_load_postcopy’:
/root/qemu-5.0/migration/ram.c:3298:1: internal compiler error: Segmentation fault
 3298 | }
      | ^
0x524cf3 crash_signal
        ../../gcc/gcc/toplev.c:328
0x411e07 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1509
0x411efb add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1531
0x411efb add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1531
0x411efb add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1531
0x412f21 lra_update_insn_regno_info(rtx_insn*)
        ../../gcc/gcc/lra.c:1624
0x413071 lra_update_insn_regno_info(rtx_insn*)
        ../../gcc/gcc/lra.c:1617
0x413071 lra_push_insn_1
        ../../gcc/gcc/lra.c:1774
0x42b373 spill_pseudos
        ../../gcc/gcc/lra-spills.c:523
0x42b373 lra_spill()
        ../../gcc/gcc/lra-spills.c:636
0x414163 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2554
0x3e5b9d do_reload
        ../../gcc/gcc/ira.c:5523
0x3e5b9d execute
        ../../gcc/gcc/ira.c:5709
Please submit a full bug report,
with preprocessed source if appropriate

I'll give the hopefully good r10-6822 another few chances to fail, because - as it is obvious by now - it seems we can't rely much on these bisect results.

Afterwards I'll give 10.2.1-1 in Hirsute a try (requested by Doko)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6822 good 0 of 37 <- giving this more runs now
r10-6823 bad 1 of 28
r10-6824 bad 1 of 33
r10-6829 bad 2 of 35
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6823 add_regs_to_insn_regno_info (lra)
r10-6824 add_regs_to_insn_regno_info (lra)
r10-6829 add_regs_to_insn_regno_info (lra)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

As mentioned before - I didn't trust this result.
And with "likeliness" of this being so low we all know that results are unreliable.
Due to that now r10-6822 is

r10-6822 - bad 2 of 67

The signature was the same "add_regs_to_insn_regno_info (lra)" as before on (again) different places tcg/tcg.c:2180 and fpu/softfloat.c:7133.

What to do from here ...
We could bisect again starting with r10-6822 and 20190425 and use at least like 100 runs each.
But that would be a last resort as I'm on ~1run/h which means ~4 days each step.

I have a few "maybe we are lucky" things to try first:
- 10.2.1-1 in hirsute
- trunk gcc-r11-5879.tar.xz
- Doing a run with -O1

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

"just retry the build" is our solution to this issue. It's a bit a waste of time hunting this all down at this point, unfortunately.

maybe we can try reproducing this on some publicly available hardware, i.e. graviton2 on aws. But also not sure how much value there is in doing this.

tags: added: rls-gg-notfixing
removed: rls-gg-incoming
Changed in gcc-10 (Ubuntu):
status: Confirmed → Won't Fix
affects: groovy → gcc
1 comments hidden view all 128 comments
Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: [Bug 1890435] Re: gcc-10 breaks on armhf (flaky): internal compiler error: Segmentation fault

On Thu, Dec 10, 2020 at 5:31 PM Dimitri John Ledkov
<email address hidden> wrote:
>
> "just retry the build" is our solution to this issue.

It is not - in hirsute the builds of the actual package on LP hit 100%
fail-rate.
Unfortunately not in the repro, but due to the above the workaround
currently is to build with gcc-9 on armhf.
But that is not a long term solution.

Therefore also this IMHO can't be won't fix

Changed in gcc-10 (Ubuntu):
status: Won't Fix → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> "just retry the build" is our solution to this issue.

It is not - in hirsute the builds of the actual package on LP hit 100% fail-rate.
Unfortunately not in the repro, but due to the above the workaround currently is to build with gcc-9 on armhf.
But that is not a long term solution.

Therefore also this IMHO can't be "won't fix"

1 comments hidden view all 128 comments
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'll give things a try in current Hirsute (gcc on 10.2.1, qemu on 5.2) building with gcc-10.
If we are back at a level where retries work I'm ok to lower severity.
I'll let you know about these results in a few days.

But since we have had the case of it reaching 100% breakage (and then would be e.g. un-serviceable) I'm unsure if we should - even then - fully close it.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In the test env (not LP build infra, but canonistack) I've got 30 good runs on 10.2.1 which gives me some hope ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Indeed, gcc-10.2.1 with qemu 5.2 no more breaks 100%.
Here a good build log
https://launchpadlibrarian.net/510811599/buildlog_ubuntu-hirsute-armhf.qemu_1%3A5.2+dfsg-2ubuntu1~ppa2_BUILDING.txt.gz

I'll need a few more builds anyway and will let you know.
As mentioned before that does lower severity, but not close the bug.

Changed in gcc-10 (Ubuntu):
status: New → Confirmed
importance: Critical → Medium
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r11-5879 - bad 8 of 10

So we know:
a) the bug has not been fixed yet
b) as we've seen with later GCC-10 runs, the chances to trigger further increased

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I left r11-5879 running over the weekend and it concluded with 37 of 75 runs failing
That is ~50%

I'll look at -O1 next

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Fails with -O1 as well, although I have to admit that different -O levels are deeply integrated in qemus build system. So it is hard to overwrite "all of them". Therefore - while I set -O1 and that affected some builds, it isn't implying that all compiler calls were -O1.

I know dannf has made some bare-metal tests and so far none of those have failed.
Unfortunately our builders are VM based, so that isn't very helpful anyway.
Never the less I've transported my test container over to a box to build there.

Trying to maas-deploy a few more chip types didn't work out, but maybe it will eventually with some help by the HWE team.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was unable to trigger the issue on my rpi4 yet, but as you'd imagine it is rather slow.
But (thanks Dannf) I got access to an X-gene - and carrying my known bad setup there (LXD container export FTW) I was able to recreate this on bare-metal as well.

(Host) Kernel: 5.4.0-58-generic
Model: X-Gene - 8 cores
The guest is Hirsute building qemu 5.0 with r11-5879

I got two known bug signatures - once the common one we see most and once a different one (that we've seen before with 20200507).

This happened on the first two runs, once it has run some hours I'll post the rate of success-vs-fails as well.

--- ---

during RTL pass: reload
/root/qemu-5.0/fpu/softfloat.c: In function ‘soft_f64_muladd’:
/root/qemu-5.0/fpu/softfloat.c:1535:1: internal compiler error: Segmentation fault
 1535 | }
      | ^
0x56715f crash_signal
        ../../gcc/gcc/toplev.c:327
0x4599ad add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1445
0x459ab9 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1537
0x459ab9 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1537
0x45abc7 lra_update_insn_regno_info(rtx_insn*)
        ../../gcc/gcc/lra.c:1630
0x468985 lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:5077
0x45bc15 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2329
42d463 do_reload
        ../../gcc/gcc/ira.c:5802
0x42d463 execute
        ../../gcc/gcc/ira.c:5988
Please submit a full bug report,

--- ---

during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^
0x56715f crash_signal
        ../../gcc/gcc/toplev.c:327
0x527e35 extract_plus_operands
        ../../gcc/gcc/rtlanal.c:6320
0x52d84b extract_plus_operands
        ../../gcc/gcc/rtlanal.c:6324
0x52d84b decompose_normal_address
        ../../gcc/gcc/rtlanal.c:6373
0x52d84b decompose_address(address_info*, rtx_def**, machine_mode, unsigned char, rtx_code)
        ../../gcc/gcc/rtlanal.c:6474
0x52dbc3 decompose_mem_address(address_info*, rtx_def*)
        ../../gcc/gcc/rtlanal.c:6493
0x463551 process_address_1
        ../../gcc/gcc/lra-constraints.c:3460
0x464c47 process_address
        ../../gcc/gcc/lra-constraints.c:3734
0x464c47 curr_insn_transform
        ../../gcc/gcc/lra-constraints.c:4049
0x468913 lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:5138
0x45bc15 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2329
0x42d463 do_reload
        ../../gcc/gcc/ira.c:5802
0x42d463 execute
        ../../gcc/gcc/ira.c:5988
Please submit a full bug report,

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The canonistack machines I used to crash it (and likely the LP builders) are X-Gene as well.
So we might have a chance to lock this in on specific HW if there are other chip types I could use.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So far 2/4 failed of r11-5879 on X-Gene BareMetal.

Doko asked me to try if I could get these to fail with -j1 as well (in the past I was unable to do so, but it is worth a try).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On BareMetal now also triggered with -j1 (but there were multiple LXD containers each running -j1 to increase the chance to find it).

/root/qemu-5.0/memory.c: In function ‘memory_region_write_accessor’:
/root/qemu-5.0/memory.c:485:1: internal compiler error: Segmentation fault
  485 | }
      | ^
0x56715f crash_signal
        ../../gcc/gcc/toplev.c:327
0x4599ad add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1445
0x459ab9 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1537
0x459ab9 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1537
0x45abc7 lra_update_insn_regno_info(rtx_insn*)
        ../../gcc/gcc/lra.c:1630
0x468985 lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:5077
0x45bc15 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2329
0x42d463 do_reload
        ../../gcc/gcc/ira.c:5802
0x42d463 execute
        ../../gcc/gcc/ira.c:5988
Please submit a full bug report,
with preprocessed source if appropriate.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Just FYI - as we were afraid of - this now starts to break SRUs and other service actions to qemu in Groovy. https://launchpad.net/ubuntu/+source/qemu/1:5.0-5ubuntu9.7/+build/21361775 just failed.
And without a better solution I'll need to trigger retry with fingers crossed.

Revision history for this message
In , Rguenth (rguenth) wrote :

GCC 10.3 is being released, retargeting bugs to GCC 10.4.

Changed in gcc:
status: Confirmed → In Progress
Revision history for this message
Oibaf (oibaf) wrote :

Is this still an issue? I was able to only reproduce it on groovy, now EoL.

Revision history for this message
In , Jakub-gcc (jakub-gcc) wrote :

GCC 10.4 is being released, retargeting bugs to GCC 10.5.

Revision history for this message
In , Rguenth (rguenth) wrote :

GCC 10 branch is being closed.

Revision history for this message
In , Pinskia (pinskia) wrote :

*** Bug 112791 has been marked as a duplicate of this bug. ***

Displaying first 40 and last 40 comments. View all 128 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.