gcc-10 breaks on armhf (flaky): internal compiler error: Segmentation fault

Bug #1890435 reported by Christian Ehrhardt 
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gcc
In Progress
Medium
gcc-10 (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Hi,
this could be the same as bug 1887557 but as I don't have enough data I'm filing it as individual issue for now.

I have only seen this happening on armhf so far.
In 2 of 5 groovy builds of qemu 5.0 this week I have hit the issue, but it is flaky.

Flakyness:
1. different file
first occurrence
/<<PKGBUILDDIR>>/target/s390x/excp_helper.c:544:1: internal compiler error: Segmentation fault
second occurrence
/<<PKGBUILDDIR>>/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault

Being so unreliable I can't provide mcuh more yet.
I filed it mostly for awareness and so that I can be dup'ed onto the right but if there is a better one.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've today seen this on DPDK
https://launchpadlibrarian.net/497142982/buildlog_ubuntu-groovy-armhf.dpdk_20.08-1ubuntu1~ppa1_BUILDING.txt.gz

And recently also on qemu again (but that was in the main archive and I could not hold back hitting retry on which it worked).

Is there anything in the pipeline that could address this and makes it worth running a few re-compiles?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There was another one in Groovy as of yesterday.
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4263/+packages
https://launchpadlibrarian.net/497497840/buildlog_ubuntu-groovy-armhf.qemu_1%3A5.0-5ubuntu8~ppa1_BUILDING.txt.gz

...
qapi/qapi-visit-block-core.c: In function ‘visit_type_q_obj_BlockdevOptions_base_members’:
qapi/qapi-visit-block-core.c:6570:1: internal compiler error: Segmentation fault
 6570 | }
      | ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
...
The bug is not reproducible, so it is likely a hardware or OS problem.

So the compiler itself is recognizing that it isn't the source code (alone) but some awkwardness that is flaky.

It seems qemu builds in groovy hit this in ~1/3 of the builds we do on armhf - not sure if that is enough for debugging for you?

Revision history for this message
Matthias Klose (doko) wrote :

no, try a local build until you have a reproducer. When DEB_BUILD_OPTIONS is set, the compiler driver retries up to three times to see if it's reproducible.

description: updated
Revision history for this message
Balint Reczey (rbalint) wrote :

Found it again in glibc 2.32-0ubuntu3 build.

vfscanf-internal.c: In function ‘__vfscanf_internal’:
vfscanf-internal.c:3057:1: internal compiler error: Segmentation fault
 3057 | }
      | ^

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'm building qemu (known to be able to trigger it) on Canonistack armhf LXD container in an arm64 VM (the setup that should be closest to the failing builders).
I also installed whoopsie and apport to catch even a single crash.

But I'm building for quite some hours by now and nothing happened.

I'll let it run the rest of the day in a a loop, but if it won't trigger again we need a better approach trying to corner this bug.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I compiled for almost 24h now, it just won't crash :-/
Not sure what else I could do to more likely reproduce this ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Another breakage at
https://launchpad.net/ubuntu/+source/qemu/1:5.0-5ubuntu9/+build/19958575
I had to retry it, we will see if it works on retry as before

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

And again on the same :-/

cc -iquote /<<PKGBUILDDIR>>/b/qemu/linux-user/s390x -iquote linux-user/s390x -iquote /<<PKGBUILDDIR>>/tcg/arm -isystem /<<PKGBUILDDIR>>/linux-headers -isystem /<<PKGBUILDDIR>>/b/qemu/linux-headers -iquote . -iquote /<<PKGBUILDDIR>> -iquote /<<PKGBUILDDIR>>/accel/tcg -iquote /<<PKGBUILDDIR>>/include -iquote /<<PKGBUILDDIR>>/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/<<PKGBUILDDIR>>/capstone/include -isystem ../linux-headers -iquote .. -iquote /<<PKGBUILDDIR>>/target/s390x -DNEED_CPU_H -iquote /<<PKGBUILDDIR>>/include -I/<<PKGBUILDDIR>>/linux-user/s390x -I/<<PKGBUILDDIR>>/linux-user/host/arm -I/<<PKGBUILDDIR>>/linux-user -Ilinux-user/s390x -MMD -MP -MT linux-user/s390x/signal.o -MF linux-user/s390x/signal.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o linux-user/s390x/signal.o /<<PKGBUILDDIR>>/linux-user/s390x/signal.c
The bug is not reproducible, so it is likely a hardware or OS problem.

There seems to be no pattern to it (e.g. on which source file it break), just a chance that increased probably on source size. But I wonder what else I could do on top of the canonistack build that I have tried - maybe concurrency?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

cc -iquote /<<PKGBUILDDIR>>/b/qemu/accel/tcg -iquote accel/tcg -iquote /<<PKGBUILDDIR>>/tcg/arm -isystem /<<PKGBUILDDIR>>/linux-headers -isystem /<<PKGBUILDDIR>>/b/qemu/linux-headers -iquote . -iquote /<<PKGBUILDDIR>> -iquote /<<PKGBUILDDIR>>/accel/tcg -iquote /<<PKGBUILDDIR>>/include -iquote /<<PKGBUILDDIR>>/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/<<PKGBUILDDIR>>/capstone/include -isystem ../linux-headers -iquote .. -iquote /<<PKGBUILDDIR>>/target/lm32 -DNEED_CPU_H -iquote /<<PKGBUILDDIR>>/include -MMD -MP -MT accel/tcg/translate-all.o -MF accel/tcg/translate-all.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o accel/tcg/translate-all.o /<<PKGBUILDDIR>>/accel/tcg/translate-all.c
during RTL pass: reload
/<<PKGBUILDDIR>>/tcg/tcg-op-gvec.c: In function ‘tcg_gen_gvec_shlv’:
/<<PKGBUILDDIR>>/tcg/tcg-op-gvec.c:2936:1: internal compiler error: Segmentation fault
 2936 | }
      | ^
Please submit a full bug report,
with preprocessed source if appropriate.

Now hit at 3/3 retries which is exactly what we were afraid of might happen ...

Changed in gcc-10 (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Bumping the prio since -as we were afraid of - this starts to become a service-problem (what if we can't rebuild anymore?)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I reduced the CPU/Mem of my canonistack system that I try to recreate on (to be more similar).
Also I now do run with DEB_BUILD_OPTIONS=parallel=4 as the real build.
/me hopes this might help to finally trigger it in a debuggable environment.

P.S: I'm now at 4/4 retries that failed for the real build ... :-/ It gladly worked on the fifth retry
P.P.S: Note to myself 4cpu/8G Memory is the real size used (I have 4/4 atm since I set it up before I could reach anyone)

Revision history for this message
Seth Forshee (sforshee) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I got the crash in the repro env.
dmesg holds no OOM which is good - also no other dmesg/journal entry that would be related.

It might be depending on concurrent execution as this was the primary change to last time.
And not having set up apport/whoopsie to catch the crash :-/
I've installed them now and run the formerly breaking command in a loop.

For the sake of "just eating cpu cycles" I have spawned some cpu hogs in the background.
But with all that in place it ran the compile 300 times without a crash :-/

It seems I have to re-run in the build env and hope that apport will catch it into /var/crash this time :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Finally:
cc -iquote /root/qemu-5.0/b/user-static/linux-user -iquote linux-user -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/user-static/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -DLEGACY_RDMA_REG_MR -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/arm -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/aarch64 -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/aarch64 -MMD -MP -MT linux-user/syscall.o -MF linux-user/syscall.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o linux-user/syscall.o /root/qemu-5.0/linux-user/syscall.c
during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^
...
The bug is not reproducible, so it is likely a hardware or OS problem.
make[2]: *** [/root/qemu-5.0/rules.mak:69: linux-user/syscall.o] Error 1
make[2]: Leaving directory '/root/qemu-5.0/b/user-static/i386-linux-user'
make[1]: *** [Makefile:527: i386-linux-user/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Still nothing in /var/crash to report :-/
Why is that - I have apport/whoopsie installed, the kernel is set up
$ sysctl -a | grep core_patt
  kernel.core_pattern = |/usr/share/apport/apport %p %s %c %d %P %E
Also I have set
$ cat ~/.config/apport/settings
[main]
unpackaged=true

This is armhf lxd on arm64 host - maybe apport has a guest/host problem here?

@Doko - do you happen to know if there are any extra whoops to jump through to get a crash report from gcc when it crashes in debuild?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, apport through the stack of LXD is ... not working.
I have used a more mundane core pattern and a C test program to ensure I will get crash dumps.

$ cat /proc/sys/kernel/core_pattern
/var/crash/core.%e.%p.%h.%t

$ gcc test.c ; ./a.out ; ll /var/crash/
Segmentation fault (core dumped)
total 3
drwxrwsrwt 2 root whoopsie 3 Sep 24 05:48 ./
drwxr-xr-x 13 root root 15 Sep 23 09:40 ../
-rw------- 1 root whoopsie 208896 Sep 24 05:48 core.a.out.189131.groovy-gccfail.1600926486

Trying to run into the real gcc crash again with this ensured ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Three reruns later I got

cc -iquote /root/qemu-5.0/b/qemu/accel/tcg -iquote accel/tcg -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/ppc -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/ppc -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/ppc -MMD -MP -MT accel/tcg/tcg-runtime-gvec.o -MF accel/tcg/tcg-runtime-gvec.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o accel/tcg/tcg-runtime-gvec.o /root/qemu-5.0/accel/tcg/tcg-runtime-gvec.c
during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^
Please submit a full bug report,
with preprocessed source if appropriate.
The bug is not reproducible, so it is likely a hardware or OS problem.

Again no crash of gcc to find, how it is disabling that ... ?!?

I was reading through /usr/share/doc/gcc-10/README.Bugs which gets mentioned in the error messages but there isn't a better hint either.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interim Summary:
- hits armhf compiles of various large source projects, chances are it it completely random
  and just hits those more likely by compiling more
- build system auto-retries the compiles and they work on retry eventually reported as "The bug
  is not reproducible, so it is likely a hardware or OS problem."
- The bug always occurs on different source files, retrying a failed one works for hundreds of
  times so it seems to be sort of random when it hits and not tied to the source.
- It seems we need concurrency to trigger it, but again it might just have increases the
  likeliness
- I can trigger it reliably now in ~2-8h of compile time on Canonistack when building qemu
  on an armhf LXD container on a arm64 Hosts (same as the builders)
- Despite my tries I'm unable to gather a crash dump of the gcc segfault and would be happy
  about a hint/advise on that.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Not sure if it is entirely random, it hit the second time on
  /<<PKGBUILDDIR>>/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
in like 2/8 hits I've had so far. Given how much code it builds that is unlikely to be an accident.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I tried to isolate what was running concurrently and found 7 gcc calls.
I have set them up to run concurrently in endless loops each.
That way they reached a lot of iterations without triggering the issue :-/

I don't know how to continue :-/
But I can share a login to this system and show how to trigger the bug.

The following will get you there and trigger the bug usually in 1-2 loops (~4h on average)
$ ssh ubuntu@10.48.130.69
$ lxc exec groovy-gccfail bash
# cd qemu-5.0/
# i=1; export DEB_BUILD_OPTIONS=parallel=4; while debuild -i -us -uc -b -d; do echo "try $((i++)) complete" >> ~/build.log; done

@Doko could you take over from here as I'd hope you know how to force gcc to give you a dump?
I imported your key to the system mentioned above.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It was brought up with foundations last week in our sync and mentioned that someone will look into it for further guidance on the case. Since nothing happened I'll add the rls-gg-incoming tag to make sure it is re-visited in your bug meetings.

I beg you pardon, i know it is your tag and please feel free to remove it if it really is incorrect here - but I just want (more or less any) a response on this from someone able to decide if this is actually critical (or not) and how to go on.

tags: added: rls-gg-incoming
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There is a new gcc-10 version from two days ago in groovy now.
I was talking with doko and we wanted to try different gcc-10 versions in general trying to corner the issue to when it started to appear.

https://launchpad.net/ubuntu/+source/gcc-10/10-20200425-1ubuntu2 - WIP
https://launchpad.net/ubuntu/+source/gcc-10/10.1.0-6ubuntu1 - WIP
https://launchpad.net/ubuntu/+source/gcc-10/10.2.0-9ubuntu2 - fails
https://launchpad.net/ubuntu/+source/gcc-10/10.2.0-11ubuntu1 - WIP

I usually had the crash in 1-2 runs, so I will consider 4 good runs as the issue being not present. Although there is some racyness to this I just can't wait much longer without growing out of a day for a single test :-/

I'll update once the I got more results

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Downloaded the other two as well and running on https://launchpad.net/ubuntu/+source/gcc-10/10.1.0-6ubuntu1 now

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: This passed two runs good by now, but that isn't enough. I need to have it running over night to be sure about 10.1

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

https://launchpad.net/ubuntu/+source/gcc-9/9.3.0-18ubuntu1 ran 6 complete runs over night and can be considered good.

So the breakage was between 9.3.0-18ubuntu1 and 10-20200425-1ubuntu2

How to continue from here, will you throw me PPA builds and/or do you still have debs anywhere I should try?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Trying gcc-snapshot 1:20200917-1ubuntu1 now

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

gcc-snapshot 1:20200917-1ubuntu1 fails in other places.

/root/qemu-5.0/linux-user/m68k/signal.c:44:1: internal compiler error: 'verify_type' failed
0xf0afc3 internal_error(char const*, ...)
 ???:0
0x8fa705 verify_type(tree_node const*)
 ???:0
0x5f644b rest_of_type_compilation(tree_node*, int)
 ???:0
0x1f61c7 finish_struct(unsigned int, tree_node*, tree_node*, tree_node*, c_struct_parse_info*)
 ???:0
0x246ef9 c_parser_declspecs(c_parser*, c_declspecs*, bool, bool, bool, bool, bool, bool, bool, c_lookahead_kind)
 ???:0
0x254d81 c_parse_file()
 ???:0
0x2a3305 c_common_parse_file()
 ???:0

So gcc-snapshot is no good to try this :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Doko passed me gcc-10 - 10.2.0-14ubuntu0.1 from https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test/+packages.
Still building on armhf, but I'll give those a try once complete.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

As expected the non-strip removed the dbgsym:

The following packages will be REMOVED:
  gcc-10-dbgsym
The following packages will be upgraded:
  cpp-10 g++-10 gcc-10 gcc-10-base gcc-10-multilib libasan6 libatomic1 libcc1-0 libgcc-10-dev libgcc-s1 libgomp1 libsfasan6 libsfatomic1 libsfgcc-10-dev libsfgcc-s1 libsfgomp1 libsfubsan1
  libstdc++-10-dev libstdc++-10-pic libstdc++6 libubsan1
21 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.

This is now running and likely to crash later today.
But since I fail to get a crash dump before that (how to get one) will be the remaining issue we need to solve.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With this build the crash does still not leave a .crash file, but it is more verbose

cc -iquote /root/qemu-5.0/b/user-static/linux-user -iquote linux-user -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/user-static/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -DLEGACY_RDMA_REG_MR -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/arm -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/aarch64 -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/aarch64 -MMD -MP -MT linux-user/syscall.o -MF linux-user/syscall.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o linux-user/syscall.o /root/qemu-5.0/linux-user/syscall.c
during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^
0x532d6b crash_signal
 ../../src/gcc/toplev.c:328
0x523a5b avoid_constant_pool_reference(rtx_def*)
 ../../src/gcc/simplify-rtx.c:237
0x4f6f9d commutative_operand_precedence(rtx_def*)
 ../../src/gcc/rtlanal.c:3482
0x4f705b swap_commutative_operands_p(rtx_def*, rtx_def*)
 ../../src/gcc/rtlanal.c:3543
0x51deb3 simplify_binary_operation(rtx_code, machine_mode, rtx_def*, rtx_def*)
 ../../src/gcc/simplify-rtx.c:2333
0x51df01 simplify_gen_binary(rtx_code, machine_mode, rtx_def*, rtx_def*)
 ../../src/gcc/simplify-rtx.c:189
0x42c191 lra_constraints(bool)
 ../../src/gcc/lra-constraints.c:4966
0x41f483 lra(_IO_FILE*)
 ../../src/gcc/lra.c:2443
0x3f0915 do_reload
 ../../src/gcc/ira.c:5527
0x3f0915 execute
 ../../src/gcc/ira.c:5713

Does this help you in any way?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'll re-run and dump a few of them just to help you to get to the root cause:

cc -iquote /root/qemu-5.0/b/qemu/block -iquote block -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -I/root/qemu-5.0/tests -I/root/qemu-5.0/tests/qtest -MMD -MP -MT block/qcow2-snapshot.o -MF block/qcow2-snapshot.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o block/qcow2-snapshot.o /root/qemu-5.0/block/qcow2-snapshot.c
0x532d6b crash_signal
 ../../src/gcc/toplev.c:328
0x41d0c7 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1515
0x41d1c9 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1537
0x41d1c9 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1537
0x41e28f lra_update_insn_regno_info(rtx_insn*)
 ../../src/gcc/lra.c:1630
0x41e3d5 lra_update_insn_regno_info(rtx_insn*)
 ../../src/gcc/lra.c:1623
0x41e3d5 lra_push_insn_1
 ../../src/gcc/lra.c:1780
0x436bb5 spill_pseudos
 ../../src/gcc/lra-spills.c:542
0x436bb5 lra_spill()
 ../../src/gcc/lra-spills.c:655
0x41f4ef lra(_IO_FILE*)
 ../../src/gcc/lra.c:2560
0x3f0915 do_reload
 ../../src/gcc/ira.c:5527
0x3f0915 execute
 ../../src/gcc/ira.c:5713

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

cc -iquote /root/qemu-5.0/b/qemu/accel/stubs -iquote accel/stubs -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/xtensa -DNEED_CPU_H -iquote /root/qemu-5.0/include -MMD -MP -MT accel/stubs/hax-stub.o -MF accel/stubs/hax-stub.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o accel/stubs/hax-stub.o /root/qemu-5.0/accel/stubs/hax-stub.c
0x532d6b crash_signal
 ../../src/gcc/toplev.c:328
0x71769f thumb2_legitimate_address_p
 ../../src/gcc/config/arm/arm.c:8500
0x717c15 arm_legitimate_address_p(machine_mode, rtx_def*, bool)
 ../../src/gcc/config/arm/arm.c:8917
0x717c15 arm_legitimate_address_p(machine_mode, rtx_def*, bool)
 ../../src/gcc/config/arm/arm.c:8912
0x427eef valid_address_p
 ../../src/gcc/lra-constraints.c:331
0x427eef simplify_operand_subreg
 ../../src/gcc/lra-constraints.c:1514
0x4287ed curr_insn_transform
 ../../src/gcc/lra-constraints.c:3946
0x42c133 lra_constraints(bool)
 ../../src/gcc/lra-constraints.c:5031
0x41f483 lra(_IO_FILE*)
 ../../src/gcc/lra.c:2443
0x3f0915 do_reload
 ../../src/gcc/ira.c:5527
0x3f0915 execute
 ../../src/gcc/ira.c:5713

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

gcc-snapshot still has various issues - but not the crash

/root/qemu-5.0/linux-user/m68k/signal.c:44:1: error: 'TYPE_CANONICAL' is not compatible
   44 | };
      | ^
...
/root/qemu-5.0/linux-user/m68k/signal.c:44:1: internal compiler error: 'verify_type' failed

Can't continue with gcc-snapshot due to those (even with the newer version).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Defaults:
# gcc -Q --help=target | grep -e '-marm' -e '-mthumb'
  -marm [disabled]
  -mthumb [enabled]
  -mthumb-interwork [enabled]

Doko suggested to change that by using -marm.
This is running since a while, but needs some more time to trigger ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Doko - I can confirm that with -marm the issue is gone.
I have had 6 full runs yesterday and overnight.

We can conclude, -mthumb is a requirement to trigger the issue.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.7 KiB)

I spoke too soon after ~7.5 runs I got the following with -marm:

cc -iquote /root/qemu-5.0/b/user-static/target/arm -iquote target/arm -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/user-static/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -marm -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -DLEGACY_RDMA_REG_MR -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/arm -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/aarch64 -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/aarch64 -MMD -MP -MT target/arm/helper.o -MF target/arm/helper.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o target/arm/helper.o /root/qemu-5.0/target/arm/helper.c
during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall’:
/root/qemu-5.0/linux-user/syscall.c:12519:1: internal compiler error: Segmentation fault
12519 | }
      | ^
cc -iquote /root/qemu-5.0/b/user-static/target/arm -iquote target/arm -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/user-static/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -marm -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -DLEGACY_RDMA_REG_MR -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/arm -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/aarch64 -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/aarch64 -MMD -MP -MT target/arm/translate-sve.o -MF target/arm/translate-sve.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_S...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI now Testing 10.2.0-14ubuntu0.2 from https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test/+sourcepub/11647665/+listing-archive-extra

I've stopped setting -marm to trigger the issue "faster", please let me know if you want me to continue to use -marm for those tests.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The extra checks that are enabled trigger the same issues I was seeing with gcc-snapshot (maybe they have it enabled as well?).

/root/qemu-5.0/linux-user/syscall.c: In function ‘do_setsockopt’:
/root/qemu-5.0/linux-user/syscall.c:1935:17: note: non-delegitimized UNSPEC UNSPEC_PIC_SYM (1) found in variable location
 1935 | static abi_long do_setsockopt(int sockfd, int level, int optname,
      | ^~~~~~~~~~~~~
...
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:7674:17: note: non-delegitimized UNSPEC UNSPEC_PIC_SYM (1) found in variable location
 7674 | static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
      | ^~~~~~~~~~~
/root/qemu-5.0/linux-user/syscall.c:7674:17: note: non-delegitimized UNSPEC UNSPEC_PIC_SYM (1) found in variable location
/root/qemu-5.0/linux-user/syscall.c:7674:17: note: non-delegitimized UNSPEC UNSPEC_PIC_SYM (1) found in variable location
...

I see many of those, but all are only "note:" level and when searching for the actual issue I now find this more verbose output (next comment for readability):

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (6.9 KiB)

Does the following help anything, do you want source and preprocessed source of it?

cc -iquote /root/qemu-5.0/b/qemu/linux-user/m68k -iquote linux-user/m68k -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/m68k -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/m68k -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/m68k -MMD -MP -MT linux-user/m68k/signal.o -MF linux-user/m68k/signal.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o linux-user/m68k/signal.o /root/qemu-5.0/linux-user/m68k/signal.c
/root/qemu-5.0/linux-user/m68k/signal.c:44:1: error: ‘TYPE_CANONICAL’ is not compatible
   44 | };
      | ^
 <array_type 0xf64ca660
    type <integer_type 0xf7af2420 unsigned int asm_written public unsigned SI
        size <integer_cst 0xf729fe58 constant 32>
        unit-size <integer_cst 0xf729fe70 constant 4>
        align:32 warn_if_not_align:0 symtab:-146335680 alias-set -1 canonical-type 0xf7af2420 precision:32 min <integer_cst 0xf72b00f0 0> max <integer_cst 0xf72b00d8 4294967295>
        pointer_to_this <pointer_type 0xf72b5ba0>>
    SI size <integer_cst 0xf729fe58 32> unit-size <integer_cst 0xf729fe70 4>
    align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf64ca660
    domain <integer_type 0xf70dbd20
        type <integer_type 0xf7af2060 sizetype public unsigned SI size <integer_cst 0xf729fe58 32> unit-size <integer_cst 0xf729fe70 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf7af2060 precision:32 min <integer_cst 0xf729fe88 0> max <integer_cst 0xf729f000 4294967295>>
        SI size <integer_cst 0xf729fe58 32> unit-size <integer_cst 0xf729fe70 4>
        align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf70dbd20 precision:32 min <integer_cst 0xf729fe88 0> max <integer_cst 0xf729fe88 0>>>
/root/qemu-5.0/linux-user/m68k/signal.c:44:1: error: ‘TYPE_MODE’ of ‘TYPE_CANONICAL’ is not compatible
 <array_type 0xf64ca660
    type <integer_type 0xf7af2420 unsigned int ...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (7.7 KiB)

The last one now is reproducible (not sure if that is what the segfault was), but still useful.

$ cd /root/qemu-5.0/b/qemu/m68k-linux-user
$ cc -iquote /root/qemu-5.0/b/qemu/linux-user/m68k -iquote linux-user/m68k -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/m68k -DNEED_CPU_H -iquote /root/qemu-5.0/include -I/root/qemu-5.0/linux-user/m68k -I/root/qemu-5.0/linux-user/host/arm -I/root/qemu-5.0/linux-user -Ilinux-user/m68k -MMD -MP -MT linux-user/m68k/signal.o -MF linux-user/m68k/signal.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o linux-user/m68k/signal.o /root/qemu-5.0/linux-user/m68k/signal.c
/root/qemu-5.0/linux-user/m68k/signal.c:44:1: error: ‘TYPE_CANONICAL’ is not compatible
   44 | };
      | ^
 <array_type 0xf6ba1a80
    type <integer_type 0xf7edb420 unsigned int asm_written public unsigned SI
        size <integer_cst 0xf7688e58 constant 32>
        unit-size <integer_cst 0xf7688e70 constant 4>
        align:32 warn_if_not_align:0 symtab:-142400624 alias-set -1 canonical-type 0xf7edb420 precision:32 min <integer_cst 0xf76990f0 0> max <integer_cst 0xf76990d8 4294967295>
        pointer_to_this <pointer_type 0xf769eba0>>
    SI size <integer_cst 0xf7688e58 32> unit-size <integer_cst 0xf7688e70 4>
    align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf6ba1a80
    domain <integer_type 0xf73a0660
        type <integer_type 0xf7edb060 sizetype public unsigned SI size <integer_cst 0xf7688e58 32> unit-size <integer_cst 0xf7688e70 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf7edb060 precision:32 min <integer_cst 0xf7688e88 0> max <integer_cst 0xf7688000 4294967295>>
        SI size <integer_cst 0xf7688e58 32> unit-size <integer_cst 0xf7688e70 4>
        align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf73a0660 precision:32 min <integer_cst 0xf7688e88 0> max <integer_cst 0xf7688e88 0>>>
/root/qemu-5.0/linux-user/m68k/signal.c:44:1: error: ‘TYPE_MODE’ of ‘TYPE_CANONICAL’ is not compatible
 <array_type 0...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
In , Matthias Klose (doko) wrote :
Download full text (5.0 KiB)

seen on the gcc-10 branch and trunk 20201003 on arm-linux-gnueabihf. Omitting -g works around the issue.

$ cat signal.i
typedef int a __attribute__((aligned(2)));
a b[1];

$ gcc -c -g -O0 signal.i
signal.i:2:1: error: 'TYPE_CANONICAL' is not compatible
    2 | a b[1];
      | ^
 <array_type 0xf751d7e0
    type <integer_type 0xf7a4f3c0 int public SI
        size <integer_cst 0xf7426e58 constant 32>
        unit-size <integer_cst 0xf7426e70 constant 4>
        align:32 warn_if_not_align:0 symtab:-144899760 alias-set -1 canonical-type 0xf7a4f3c0 precision:32 min <integer_cst 0xf74370a8 -2147483648> max <integer_cst 0xf74370c0 2147483647>
        pointer_to_this <pointer_type 0xf7a4ff00>>
    SI size <integer_cst 0xf7426e58 32> unit-size <integer_cst 0xf7426e70 4>
    align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf751d7e0
    domain <integer_type 0xf751d6c0
        type <integer_type 0xf7a4f060 sizetype public unsigned SI size <integer_cst 0xf7426e58 32> unit-size <integer_cst 0xf7426e70 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf7a4f060 precision:32 min <integer_cst 0xf7426e88 0> max <integer_cst 0xf7426000 4294967295>>
        SI size <integer_cst 0xf7426e58 32> unit-size <integer_cst 0xf7426e70 4>
        align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf751d6c0 precision:32 min <integer_cst 0xf7426e88 0> max <integer_cst 0xf7426e88 0>>>
signal.i:2:1: error: 'TYPE_MODE' of 'TYPE_CANONICAL' is not compatible
 <array_type 0xf751d7e0
    type <integer_type 0xf7a4f3c0 int public SI
        size <integer_cst 0xf7426e58 constant 32>
        unit-size <integer_cst 0xf7426e70 constant 4>
        align:32 warn_if_not_align:0 symtab:-144899760 alias-set -1 canonical-type 0xf7a4f3c0 precision:32 min <integer_cst 0xf74370a8 -2147483648> max <integer_cst 0xf74370c0 2147483647>
        pointer_to_this <pointer_type 0xf7a4ff00>>
    SI size <integer_cst 0xf7426e58 32> unit-size <integer_cst 0xf7426e70 4>
    align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf751d7e0
    domain <integer_type 0xf751d6c0
        type <integer_type 0xf7a4f060 sizetype public unsigned SI size <integer_cst 0xf7426e58 32> unit-size <integer_cst 0xf7426e70 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf7a4f060 precision:32 min <integer_cst 0xf7426e88 0> max <integer_cst 0xf7426000 4294967295>>
        SI size <integer_cst 0xf7426e58 32> unit-size <integer_cst 0xf7426e70 4>
        align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf751d6c0 precision:32 min <integer_cst 0xf7426e88 0> max <integer_cst 0xf7426e88 0>>>
 <array_type 0xf751d600
    type <integer_type 0xf751d660 a SI
        size <integer_cst 0xf7426e58 constant 32>
        unit-size <integer_cst 0xf7426e70 constant 4>
        user align:16 warn_if_not_align:0 symtab:-144899808 alias-set -1 canonical-type 0xf7a4f3c0 precision:32 min <integer_cst 0xf74370a8 -2147483648> max <integer_cst 0xf74370c0 2147483647>>
    no-force-blk BLK size <integer_cst 0xf7426e58 32> unit-size <integer_cst 0xf7426e70 4>
    user align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-typ...

Read more...

Changed in groovy:
importance: Unknown → Medium
status: Unknown → New
Revision history for this message
In , Rguenth (rguenth) wrote :

works on x86_64-linux

Revision history for this message
In , Ktkachov (ktkachov) wrote :

confirmed on trunk with the extra checking enabled

Revision history for this message
In , Fabio Pedretti (pedretti-fabio) wrote :

*** Bug 97368 has been marked as a duplicate of this bug. ***

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gcc-10 (Ubuntu):
status: New → Confirmed
Revision history for this message
In , Matthias Klose (doko) wrote :

this started with a regression hunt in-between with GCC 9 and GCC 10. However the test case with a compiler configured with the extra and rtl checking already produces this ICE with 2018-01-01 and 2019-01-01 builds.

Revision history for this message
In , Fabio Pedretti (pedretti-fabio) wrote :

This issue is reproducible (but more rarely) also when using -g0 , see the full build log here:
https://launchpadlibrarian.net/501972023/buildlog_ubuntu-groovy-armhf.mesa_20.3~git2010140730.775866~oibaf~g_BUILDING.txt.gz

Revision history for this message
dann frazier (dannf) wrote :

I was talking to Matthias and he mentioned that this seems to be correlated with the LP builder upgrade to bionic:
  https://lists.ubuntu.com/archives/ubuntu-devel/2020-September/041158.html

I'm running some tests to see if there might be a lower level issue:
  https://docs.google.com/spreadsheets/d/1zWsNIwMAPhyTSmd2aibXN_u9jDaGs8nOVXhTEraTclQ/edit

Revision history for this message
In , Matthias Klose (doko) wrote :

this is triggered by:

2015-05-19 Jan Hubicka <email address hidden>

       * tree.c (verify_type_variant): Fix #undef.
       (gimple_canonical_types_compatible_p): Move here from lto.c
       (verify_type): Verify TYPE_CANONICAL compatibility.
       * tree.h (gimple_canonical_types_compatible_p): Declare.

Revision history for this message
In , Matthias Klose (doko) wrote :

commit 872d5034baa1007606d405e37937908602fbbe51

Revision history for this message
dann frazier (dannf) wrote :

I've been able to reproduce reliably on X-Gene gear when running in a KVM instance. I have not been able to reproduce outside of KVM, nor on an alternate SoC (Hi1616). I *can* reproduce on a xenial kvm guest running on a xenial X-Gene host - which suggests that correlation with the LP builder upgrade is likely just coincidence. I also tried an older xenial guest kernel just in case there was a kernel patch that was backported to all releases that may have broke things - but I was also able to reproduce there.

If I were to draw a conclusion at this stage, it would be that there may very well be a low level issue causing this but, if so, it is unlikely a new one.

Changed in groovy:
status: New → Confirmed
Revision history for this message
In , Mkuvyrkov (mkuvyrkov) wrote :

Hi Richard,

Interested in checking out this bug? The original testcase is from QEMU source: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=972789 .

Revision history for this message
In , Rth-d (rth-d) wrote :

As a data point, this problem can be seen with any
strict-alignment target -- e.g. sparc.

Revision history for this message
In , Rth-d (rth-d) wrote :

Created attachment 49473
rfc patch

The following fixes the ICE.
It seems like a hack, done at the wrong level.

Should we have in fact set TYPE_STRUCTURAL_EQUALITY_P all the way
back on the unaligned 'a' type, before we even try to create an
array of 'a'? If so, that would have properly triggered the test
here in build_array_type_1 that would have bypassed the problem.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I got a ping by Doko (thanks) to try
https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test/+build/20227507

ii cpp-10 10.2.0-16ubuntu1.1 armhf GNU C preprocessor
ii g++-10 10.2.0-16ubuntu1.1 armhf GNU C++ compiler
ii gcc-10 10.2.0-16ubuntu1.1 armhf GNU C compiler
ii gcc-10-base:armhf 10.2.0-16ubuntu1.1 armhf GCC, the GNU Compiler Collection (base package)
ii libasan6:armhf 10.2.0-16ubuntu1.1 armhf AddressSanitizer -- a fast memory error detector
ii libatomic1:armhf 10.2.0-16ubuntu1.1 armhf support library providing __atomic built-in functions
ii libcc1-0:armhf 10.2.0-16ubuntu1.1 armhf GCC cc1 plugin for GDB
ii libgcc-10-dev:armhf 10.2.0-16ubuntu1.1 armhf GCC support library (development files)
ii libgcc-s1:armhf 10.2.0-16ubuntu1.1 armhf GCC support library
ii libgomp1:armhf 10.2.0-16ubuntu1.1 armhf GCC OpenMP (GOMP) support library
ii libstdc++-10-dev:armhf 10.2.0-16ubuntu1.1 armhf GNU Standard C++ Library v3 (development files)
ii libstdc++-10-pic:armhf 10.2.0-16ubuntu1.1 armhf GNU Standard C++ Library v3 (shared library subset kit)
ii libstdc++6:armhf 10.2.0-16ubuntu1.1 armhf GNU Standard C++ Library v3
ii libubsan1:armhf 10.2.0-16ubuntu1.1 armhf UBSan -- undefined behaviour sanitizer (runtime)

I started my loop with that build and will report back later if that triggered the issue again (or another one).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI as one would expect this continues to affect Hirsute just as much, I just had a broken qemu-5.1 build on armhf.

/<<BUILDDIR>>/qemu-5.1+dfsg/linux-user/m68k/signal.c:44:1: error: ‘TYPE_MODE’ of ‘TYPE_CANONICAL’ is not compatible

P.S. I'm still unsure if that "TYPE_CANONICAL" issue IS the formerly seen crash or just a new issue on top with either the debug builds and/or newer versions. Did anybody track the crashes down enough to know if those are really "the same"?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

And FYI the test build with the new compiler by Doko still runs, but we know that not failing on the first rounds isn't a 100% win. I'll let it continue some hours and ping back later once it passed e.g. 5 rounds or so which we never achieved before.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Over night it made 5 complete runs and all worked.
@Doko - I think we can call this fix you have a good one at least from my current tests POV.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've learned that the gcc in hirsute has the checking enabled atm.
That explains why any qemu 5.1 build I do (merging) or Doko's rebuild on [1] fail atm.

New GCC build is coming in [2] that has the fix applied but the checking no more enabled.
Once built I'll re-test that one as well.

[1]: https://launchpad.net/ubuntu/+source/qemu/1:5.0-5ubuntu10
[2]: https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/volatile/+sourcepub/11742356/+listing-archive-extra

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - Started a build run with 10.2.0-16ubuntu1.2

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Failed on the second run with:

during RTL pass: reload
/root/qemu-5.0/fpu/softfloat.c: In function ‘soft_f64_muladd’:
/root/qemu-5.0/fpu/softfloat.c:1535:1: internal compiler error: Segmentation fault
 1535 | }
      | ^
...
0x532aeb crash_signal
 ../../src/gcc/toplev.c:328
0x41ccd7 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1515
0x41cdd9 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1537
0x41cdd9 add_regs_to_insn_regno_info
 ../../src/gcc/lra.c:1537
0x41de9f lra_update_insn_regno_info(rtx_insn*)
 ../../src/gcc/lra.c:1630
0x42be95 lra_constraints(bool)
 ../../src/gcc/lra-constraints.c:4975
0x41f093 lra(_IO_FILE*)
 ../../src/gcc/lra.c:2443
0x3f0405 do_reload
 ../../src/gcc/ira.c:5527
0x3f0405 execute
 ../../src/gcc/ira.c:5713
Please submit a full bug report,
...
The bug is not reproducible, so it is likely a hardware or OS problem.

So we surely fixed the "TYPE_CANONICAL" issue in the checker builds.
But is this one that I hit now the same original issue we had before or a different one?
Can you derive that from the traceback?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

For the sake of a potential upstream change I was trying qemu from git up to current master, but it still fails with the same error: ‘TYPE_CANONICAL’ is not compatible.
Due to that - as long as the checking is enabled - qemu is unbuildable in Hirsute.

At the same time Doko and I began with tests for a potential bisect of gcc.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Bisect step #1 - Expected to ICE and indeed does so.

gcc-20200507.tar.xz

/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^

0x5518cb crash_signal
 ../../gcc/gcc/toplev.c:328
0x51299f extract_plus_operands
 ../../gcc/gcc/rtlanal.c:6314
0x51299f extract_plus_operands
 ../../gcc/gcc/rtlanal.c:6317
0x5185d1 decompose_normal_address
 ../../gcc/gcc/rtlanal.c:6366
0x5185d1 decompose_address(address_info*, rtx_def**, machine_mode, unsigned char, rtx_code)
 ../../gcc/gcc/rtlanal.c:6467
0x51892f decompose_mem_address(address_info*, rtx_def*)
 ../../gcc/gcc/rtlanal.c:6486
0x44822f process_address_1
 ../../gcc/gcc/lra-constraints.c:3367
0x449803 process_address
 ../../gcc/gcc/lra-constraints.c:3641
0x449803 curr_insn_transform
 ../../gcc/gcc/lra-constraints.c:3956
0x44cfd5 lra_constraints(bool)
 ../../gcc/gcc/lra-constraints.c:5029
0x440653 lra(_IO_FILE*)
 ../../gcc/gcc/lra.c:2440
0x411f05 do_reload
 ../../gcc/gcc/ira.c:5523
0x411f05 execute
 ../../gcc/gcc/ira.c:5709

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI currently on 20190425.
First build passed, but we need a few more to be sure.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

20190425 can be considered good it completed 4.5 times before I scheduled the next run.
Next is r10-4054 which has -v of:
gcc version 10.0.0 20191022 (experimental) (GCC)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-4054 failed with

during RTL pass: reload
/root/qemu-5.0/fpu/softfloat.c: In function ‘sf_canonicalize’:
/root/qemu-5.0/fpu/softfloat.c:670:1: internal compiler error: Segmentation fault
  670 | }
      | ^

0x4b7443 crash_signal
 ../../gcc/gcc/toplev.c:326
0x66b7c7 thumb2_legitimate_address_p
 ../../gcc/gcc/config/arm/arm.c:8242
0x66bc31 arm_legitimate_address_p(machine_mode, rtx_def*, bool)
 ../../gcc/gcc/config/arm/arm.c:8657
0x458667 memory_address_addr_space_p(machine_mode, rtx_def*, unsigned char)
 ../../gcc/gcc/recog.c:1340
0x85b013 nonimmediate_soft_df_operand_1
 ../../gcc/gcc/config/arm/predicates.md:514
0x85b013 nonimmediate_soft_df_operand(rtx_def*, machine_mode)
 ../../gcc/gcc/config/arm/predicates.md:530
0x85b013 nonimmediate_soft_df_operand(rtx_def*, machine_mode)
 ../../gcc/gcc/config/arm/predicates.md:518

I'm unsure how to deal with this. It is an ICE, and it happened throughout the RTL stage as before. But the signature seems to be a different one.

It could be "bad" with the actual issue we look for hidden behind this.
It could be "good" as well with the actual issue we look fixed but hidden behind this.
Or it could be "bad" as in, this is the same issue but with a different signature.

For the time being I'll handle it as bad, with some luck we get our old trace back further down the bisect. But @Doko please have a look at the trace above (maybe it is a known one) and advise.

Starting on r10-2027

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-2027 seems to be good passing 4 runs without a fail.

Continuing with r10-3040 next.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-3040 got 4.5 good passes before I aborted it - it seems to be good as well.

That means next is r10-3400

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-3400 is good

I need to switch to a different overview to make sure I can track this :-)

20190425 good
r10-1014
r10-2027 good
r10-2533
r10-3040 good
r10-3220
r10-3400 good
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 next
r10-3727
r10-4054 bad
r10-6080
20200507 bad

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-3657 had 5 good runs

Status:
20190425 good
r10-1014
r10-2027 good
r10-2533
r10-3040 good
r10-3220
r10-3400 good
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good
r10-3727 next
r10-4054 bad
r10-6080
20200507 bad

I might want to do a re-run of r10-4054 if r10-3727 is also good. Just to ensure we are not doing 6 more steps on something that won't fail.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

1.5 builds good on r10-3727, but that is not enough to make a decision. Right now there is some machine downtime due to a datacenter move. Back on Monday I guess.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: Systems are back up, restarted tests on r10-3727

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-3727 had another 2.5 good runs, overall it LGTM now.
I'll re-run r10-4054 just to be sure not to hunt a ghost.

20190425 good
r10-1014
r10-2027 good
r10-2533
r10-3040 good
r10-3220
r10-3400 good
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good
r10-3727 good
r10-4054 bad next
r10-6080
20200507 bad

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On this re-check r10-4054 had 7 complete runs without a fail.
So as I was afraid of in comment 71 already, it might have been another much more rare ICE hidden in there as well. Or OTOH we are cursed by some very bad statistical chances :-/.

I'll check r10-6080 next to see if it
a) reproduces an ICE faster
b) will show the same signature we saw more often before

20190425 good
r10-1014
r10-2027 good
r10-2533
r10-3040 good
r10-3220
r10-3400 good
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good
r10-3727 good
r10-4054 other kind of bad?
r10-6080 next
20200507 bad

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6080 now had 10 good runs.

I'm going back to test 20200507 next - we had bad states with that version so often this MUST trigger IMHO,

Reminder this runs on in armhf LXD containers on arm64 VMs (like our builds do).
I'm slowly getting the feeling it could be an issue with the underlying virtualization or bare metal.
We had a datacenter move, so the cloud runs on the same bare metal overall, but my instance could run on something else today than last week. If 20200507 no more triggers we have to investigate where the code is running.

20190425 good
r10-1014
r10-2027 good
r10-2533
r10-3040 good
r10-3220
r10-3400 good
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good
r10-3727 good
r10-4054 other kind of bad?
r10-6080 good
20200507 bad ?next?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - inquiry for the underlying HW/SW is in RT 128805 - I set Doko and Rick to CC on that.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, 20200507 almost immediately triggered the ICE

/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^

0x5518cb crash_signal
 ../../gcc/gcc/toplev.c:328
0x542673 avoid_constant_pool_reference(rtx_def*)
 ../../gcc/gcc/simplify-rtx.c:237
0x515cad commutative_operand_precedence(rtx_def*)
 ../../gcc/gcc/rtlanal.c:3482
0x515d6b swap_commutative_operands_p(rtx_def*, rtx_def*)
 ../../gcc/gcc/rtlanal.c:3543
0x53cacb simplify_binary_operation(rtx_code, machine_mode, rtx_def*, rtx_def*)
 ../../gcc/gcc/simplify-rtx.c:2333
0x53cb19 simplify_gen_binary(rtx_code, machine_mode, rtx_def*, rtx_def*)
 ../../gcc/gcc/simplify-rtx.c:189
0x44d033 lra_constraints(bool)
 ../../gcc/gcc/lra-constraints.c:4964
0x440653 lra(_IO_FILE*)
 ../../gcc/gcc/lra.c:2440
0x411f05 do_reload
 ../../gcc/gcc/ira.c:5523
0x411f05 execute
 ../../gcc/gcc/ira.c:5709

This triggered on the first build. While waiting for some builds between r10-6080 and 20200507 I'll rerun this version to get some stats on how early to expect it.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Another crash with 20200507 at first try:

/root/qemu-5.0/fpu/softfloat.c: In function ‘float128_div’:
/root/qemu-5.0/fpu/softfloat.c:7504:1: internal compiler error: Segmentation fault
 7504 | }
      | ^

0x5518cb crash_signal
 ../../gcc/gcc/toplev.c:328
0x43e363 add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1512
0x43e465 add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1534
0x43e465 add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1534
0x43e51b add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1538
0x43f497 lra_update_insn_regno_info(rtx_insn*)
 ../../gcc/gcc/lra.c:1627
0x43f5dd lra_update_insn_regno_info(rtx_insn*)
 ../../gcc/gcc/lra.c:1620
0x43f5dd lra_push_insn_1
 ../../gcc/gcc/lra.c:1777
0x4579fb spill_pseudos
 ../../gcc/gcc/lra-spills.c:542
0x4579fb lra_spill()
 ../../gcc/gcc/lra-spills.c:655
0x4406bf lra(_IO_FILE*)
 ../../gcc/gcc/lra.c:2557
0x411f05 do_reload
 ../../gcc/gcc/ira.c:5523
0x411f05 execute
 ../../gcc/gcc/ira.c:5709

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Doko is so kind and builds r10-7093 got me.

20190425 good
r10-1014
r10-2027 good
r10-2533
r10-3040 good
r10-3220
r10-3400 good
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good
r10-3727 good
r10-4054 other kind of bad?
r10-6080 good
r10-7093 next
20200507 bad bad bad

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

3 full runs good with r10-7093 but then I got:

/root/qemu-5.0/disas/nanomips.cpp: In member function ‘std::string NMD::JALRC_HB(uint64)’:
/root/qemu-5.0/disas/nanomips.cpp:7969:1: internal compiler error: Segmentation fault
 7969 | }
      | ^
0x602fa7 crash_signal
 ../../gcc/gcc/toplev.c:328
0x4f1f47 add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1509
0x4f203b add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1531
0x4f3061 lra_update_insn_regno_info(rtx_insn*)
 ../../gcc/gcc/lra.c:1624
0x505ca7 process_insn_for_elimination
 ../../gcc/gcc/lra-eliminations.c:1322
0x505ca7 lra_eliminate(bool, bool)
 ../../gcc/gcc/lra-eliminations.c:1372
0x500877 lra_constraints(bool)
 ../../gcc/gcc/lra-constraints.c:4856
0x4f4237 lra(_IO_FILE*)
 ../../gcc/gcc/lra.c:2437
0x4c5c59 do_reload
 ../../gcc/gcc/ira.c:5523
0x4c5c59 execute
 ../../gcc/gcc/ira.c:5709

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We again need to ask, is this the one we are hunting for - or might it be another issue in between.
Doko ?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

To be sure i was running r10-7093 again and so far got 8 good runs in a row :-/
If only we could have a better trigger :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

14 runs and going ...
It was never "so rare" when we were at the gcc that is in hirsute or 20200507.
I'll let it continue to run for now

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Failed on #17
during RTL pass: reload

/root/qemu-5.0/fpu/softfloat.c: In function ‘soft_f64_muladd’:
/root/qemu-5.0/fpu/softfloat.c:1535:1: internal compiler error: Segmentation fault
 1535 | }
      | ^
cc -iquote /root/qemu-5.0/b/qemu/target/mips -iquote target/mips -iquote /root/qemu-5.0/tcg/arm -isystem /root/qemu-5.0/linux-headers -isystem /root/qemu-5.0/b/qemu/linux-headers -iquote . -iquote /root/qemu-5.0 -iquote /root/qemu-5.0/accel/tcg -iquote /root/qemu-5.0/include -iquote /root/qemu-5.0/disas/libvixl -I/usr/include/pixman-1 -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib/arm-linux-gnueabihf/glib-2.0/include -fPIE -DPIE -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -g -O2 -fdebug-prefix-map=/root/qemu-5.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -DSTRUCT_IOVEC_DEFINED -I/usr/include/libpng16 -I/root/qemu-5.0/capstone/include -isystem ../linux-headers -iquote .. -iquote /root/qemu-5.0/target/mips -DNEED_CPU_H -iquote /root/qemu-5.0/include -MMD -MP -MT target/mips/helper.o -MF target/mips/helper.d -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g -c -o target/mips/helper.o /root/qemu-5.0/target/mips/helper.c
0x527c2f crash_signal
 ../../gcc/gcc/toplev.c:328
0x4147bf add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1509
0x4148b3 add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1531
0x4148b3 add_regs_to_insn_regno_info
 ../../gcc/gcc/lra.c:1531
0x4158d9 lra_update_insn_regno_info(rtx_insn*)
 ../../gcc/gcc/lra.c:1624
0x415a29 lra_update_insn_regno_info(rtx_insn*)
 ../../gcc/gcc/lra.c:1617
0x415a29 lra_push_insn_1
 ../../gcc/gcc/lra.c:1774
0x42dd53 spill_pseudos
 ../../gcc/gcc/lra-spills.c:523
0x42dd53 lra_spill()
 ../../gcc/gcc/lra-spills.c:636
0x416b1b lra(_IO_FILE*)
 ../../gcc/gcc/lra.c:2554
0x3e84d1 do_reload
 ../../gcc/gcc/ira.c:5523
0x3e84d1 execute
 ../../gcc/gcc/ira.c:5709

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'm not yet sure what we should learn from that - do we need 30 runs of each step to be somewhat sure? That makes an already slow bisect even slower ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - another 8 runs without a crash on r10-7093.
My current working theory is that the root cause of the crash might have been added as early as r10-4054 but one or many later changes have increased the chance (think increase the race window or such) for the issue to trigger.
If that assumption is true and with the current testcase it is nearly impossible to properly bisect the "original root cause". And at the same time still hard to find the one that increased the race window - since crashing early does not surely imply we are in the high/low chance area.

We've had many runs with the base versions so that one is really good.
But any other good result we've had so far could - in theory - be challenged and needs ~30 good runs to be somewhat sure (puh that will be a lot of time).

I'm marking the old runs that are debatable with good?<count-of-good-runs>.

Also we might want to look for just the "new" crash signature.

20190425 good
r10-1014
r10-2027 good?4
r10-2533
r10-3040 good?4
r10-3220
r10-3400 good?4
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good?5
r10-3727 good?3
r10-4054 other kind of bad - signature different, and rare?
r10-6080 good?10
r10-7093 bad, but slow to trigger
20200507 bad bad bad

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Of course it could be that the same root cause surfaces as two different signatures - but to it could as well be a multitude of issues. Therefore - for now - "add_regs_to_insn_regno_info (lra)" is what I'll continue to hunt for.

With some luck (do we have any in this?) the 10 runs on 6080 are sufficient.
Let us try r10-6586 next and plan for 15-30 runs to be sure it is good.
If hitting the issue I'll still re-run it so we can compare multiple signatures.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Since this seems to become a reproducibility-fest I've spawned and prepared two more workers using the same setup as the one we used before. That should allow for some more runs/days to increase the rate at we can process it - given the new insight to its unreliability.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6586 - passed 27 good runs, no fails

Updated Result Overview:
20190425 good
r10-1014
r10-2027 good?4
r10-2533
r10-3040 good?4
r10-3220
r10-3400 good?4
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good?5
r10-3727 good?3
r10-4054 other kind of bad - signature different, and rare?
r10-6080 good?10
r10-6586 good?27
r10-7093 bad, but slow to trigger (2 of 19)
20200507 bad bad bad

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Next I'll run r10-7093 in this new setup.
@Doko - It would be great to have ~6760 be built for the likely next step.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Add another 1/3 fails to r10-7093

Now I am on the next two
- r10-6760
- r10-6839

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

2/7 runs of r10-6839 failed with
r10-6839 add_regs_to_insn_regno_info (lra)

Next will be r10-6760

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Updated Result Overview:
20190425 good
r10-1014
r10-2027 good?4
r10-2533
r10-3040 good?4
r10-3220
r10-3400 good?4
r10-3450
r10-3475
r10-3478
r10-3593
r10-3622
r10-3657 good?5
r10-3727 good?3
r10-4054 other kind of bad - signature different, and rare?
r10-6080 good?10
r10-6586 good?27
r10-6760 next
r10-6839 bad (2 of 9)
r10-7093 bad, but slow to trigger (2 of 19)
20200507 bad bad bad

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We'll need more runs to be sure, but so far r10-6760 seems good.
In preparation - could I requests builds between r10-6760 - r10-6839 please ?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, r10-6760 reached 20 good runs and is considered good.
Doko was so kind to build 6779 6799 6819 for me - of which 6799 will be next.

Note: I've aligned the comments to all have the same style and dropped the untested revisions.

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6779 untested
r10-6799 next
r10-6819 untested
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: r10-6799 had 14 good runs so far, I'll let it run for a bit longer to be sure.
Then - later today - if nothing changes r10-6819 will be next.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Completed 20 good runs on r10-6799, continuing with r10-6819 as planned.

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 next
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 extract_plus_operands (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6819 had 22 good runs.
r10-6829 will be the next to try.

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6829 next
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6829 has 2 fails in 35 runs
Signature matches, both are: add_regs_to_insn_regno_info (lra)
r10-6824 = next

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6824 next
r10-6829 bad 2 of 35
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6829 add_regs_to_insn_regno_info (lra)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6824 bad 1 of 24, signature matches
We have only a few steps to go and need to increase the number of runs to be sure, so I'll let it run for a while longer.
Also - eventually - I'll re-run what we consider to be the last good, quite a few times to be sure.

Most likely I'll later today switch and test r10-6822 next.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6822 next
r10-6824 bad 1 of 33
r10-6829 bad 2 of 35
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6824 add_regs_to_insn_regno_info (lra)
r10-6829 add_regs_to_insn_regno_info (lra)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6822 so far has 0 of 20, but I'll let it run another ~24h

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6822 seems good.

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6822 good 0 of 37
r10-6823 next
r10-6824 bad 1 of 33
r10-6829 bad 2 of 35
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6824 add_regs_to_insn_regno_info (lra)
r10-6829 add_regs_to_insn_regno_info (lra)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r10-6823 bad 1 of 28

during RTL pass: reload
/root/qemu-5.0/migration/ram.c: In function ‘ram_load_postcopy’:
/root/qemu-5.0/migration/ram.c:3298:1: internal compiler error: Segmentation fault
 3298 | }
      | ^
0x524cf3 crash_signal
        ../../gcc/gcc/toplev.c:328
0x411e07 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1509
0x411efb add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1531
0x411efb add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1531
0x411efb add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1531
0x412f21 lra_update_insn_regno_info(rtx_insn*)
        ../../gcc/gcc/lra.c:1624
0x413071 lra_update_insn_regno_info(rtx_insn*)
        ../../gcc/gcc/lra.c:1617
0x413071 lra_push_insn_1
        ../../gcc/gcc/lra.c:1774
0x42b373 spill_pseudos
        ../../gcc/gcc/lra-spills.c:523
0x42b373 lra_spill()
        ../../gcc/gcc/lra-spills.c:636
0x414163 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2554
0x3e5b9d do_reload
        ../../gcc/gcc/ira.c:5523
0x3e5b9d execute
        ../../gcc/gcc/ira.c:5709
Please submit a full bug report,
with preprocessed source if appropriate

I'll give the hopefully good r10-6822 another few chances to fail, because - as it is obvious by now - it seems we can't rely much on these bisect results.

Afterwards I'll give 10.2.1-1 in Hirsute a try (requested by Doko)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Updated Result Overview:
20190425 good 0 of 13
r10-2027 good 0 of 4
r10-3040 good 0 of 4
r10-3400 good 0 of 4
r10-3657 good 0 of 5
r10-3727 good 0 of 3
r10-4054 other kind of bad 1 of 18 (signature different)
r10-6080 good 0 of 10
r10-6586 good 0 of 27
r10-6760 good 0 of 20
r10-6799 good 0 of 20
r10-6819 good 0 of 22
r10-6822 good 0 of 37 <- giving this more runs now
r10-6823 bad 1 of 28
r10-6824 bad 1 of 33
r10-6829 bad 2 of 35
r10-6839 bad 2 of 9
r10-7093 bad 2 of 19
20200507 bad 3 of 7

Signatures:
r10-4054 arm_legitimate_address_p (nonimmediate)
r10-6823 add_regs_to_insn_regno_info (lra)
r10-6824 add_regs_to_insn_regno_info (lra)
r10-6829 add_regs_to_insn_regno_info (lra)
r10-6839 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
r10-7093 add_regs_to_insn_regno_info (lra)
20200507 add_regs_to_insn_regno_info (lra)
20200507 avoid_constant_pool_reference (lra)
20200507 extract_plus_operands (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 add_regs_to_insn_regno_info (lra)
ubu-10.2 avoid_constant_pool_reference (lra)
ubu-10.2 thumb2_legitimate_address_p (lra)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

As mentioned before - I didn't trust this result.
And with "likeliness" of this being so low we all know that results are unreliable.
Due to that now r10-6822 is

r10-6822 - bad 2 of 67

The signature was the same "add_regs_to_insn_regno_info (lra)" as before on (again) different places tcg/tcg.c:2180 and fpu/softfloat.c:7133.

What to do from here ...
We could bisect again starting with r10-6822 and 20190425 and use at least like 100 runs each.
But that would be a last resort as I'm on ~1run/h which means ~4 days each step.

I have a few "maybe we are lucky" things to try first:
- 10.2.1-1 in hirsute
- trunk gcc-r11-5879.tar.xz
- Doing a run with -O1

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

"just retry the build" is our solution to this issue. It's a bit a waste of time hunting this all down at this point, unfortunately.

maybe we can try reproducing this on some publicly available hardware, i.e. graviton2 on aws. But also not sure how much value there is in doing this.

tags: added: rls-gg-notfixing
removed: rls-gg-incoming
Changed in gcc-10 (Ubuntu):
status: Confirmed → Won't Fix
affects: groovy → gcc
Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: [Bug 1890435] Re: gcc-10 breaks on armhf (flaky): internal compiler error: Segmentation fault

On Thu, Dec 10, 2020 at 5:31 PM Dimitri John Ledkov
<email address hidden> wrote:
>
> "just retry the build" is our solution to this issue.

It is not - in hirsute the builds of the actual package on LP hit 100%
fail-rate.
Unfortunately not in the repro, but due to the above the workaround
currently is to build with gcc-9 on armhf.
But that is not a long term solution.

Therefore also this IMHO can't be won't fix

Changed in gcc-10 (Ubuntu):
status: Won't Fix → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> "just retry the build" is our solution to this issue.

It is not - in hirsute the builds of the actual package on LP hit 100% fail-rate.
Unfortunately not in the repro, but due to the above the workaround currently is to build with gcc-9 on armhf.
But that is not a long term solution.

Therefore also this IMHO can't be "won't fix"

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'll give things a try in current Hirsute (gcc on 10.2.1, qemu on 5.2) building with gcc-10.
If we are back at a level where retries work I'm ok to lower severity.
I'll let you know about these results in a few days.

But since we have had the case of it reaching 100% breakage (and then would be e.g. un-serviceable) I'm unsure if we should - even then - fully close it.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In the test env (not LP build infra, but canonistack) I've got 30 good runs on 10.2.1 which gives me some hope ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Indeed, gcc-10.2.1 with qemu 5.2 no more breaks 100%.
Here a good build log
https://launchpadlibrarian.net/510811599/buildlog_ubuntu-hirsute-armhf.qemu_1%3A5.2+dfsg-2ubuntu1~ppa2_BUILDING.txt.gz

I'll need a few more builds anyway and will let you know.
As mentioned before that does lower severity, but not close the bug.

Changed in gcc-10 (Ubuntu):
status: New → Confirmed
importance: Critical → Medium
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

r11-5879 - bad 8 of 10

So we know:
a) the bug has not been fixed yet
b) as we've seen with later GCC-10 runs, the chances to trigger further increased

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I left r11-5879 running over the weekend and it concluded with 37 of 75 runs failing
That is ~50%

I'll look at -O1 next

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Fails with -O1 as well, although I have to admit that different -O levels are deeply integrated in qemus build system. So it is hard to overwrite "all of them". Therefore - while I set -O1 and that affected some builds, it isn't implying that all compiler calls were -O1.

I know dannf has made some bare-metal tests and so far none of those have failed.
Unfortunately our builders are VM based, so that isn't very helpful anyway.
Never the less I've transported my test container over to a box to build there.

Trying to maas-deploy a few more chip types didn't work out, but maybe it will eventually with some help by the HWE team.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was unable to trigger the issue on my rpi4 yet, but as you'd imagine it is rather slow.
But (thanks Dannf) I got access to an X-gene - and carrying my known bad setup there (LXD container export FTW) I was able to recreate this on bare-metal as well.

(Host) Kernel: 5.4.0-58-generic
Model: X-Gene - 8 cores
The guest is Hirsute building qemu 5.0 with r11-5879

I got two known bug signatures - once the common one we see most and once a different one (that we've seen before with 20200507).

This happened on the first two runs, once it has run some hours I'll post the rate of success-vs-fails as well.

--- ---

during RTL pass: reload
/root/qemu-5.0/fpu/softfloat.c: In function ‘soft_f64_muladd’:
/root/qemu-5.0/fpu/softfloat.c:1535:1: internal compiler error: Segmentation fault
 1535 | }
      | ^
0x56715f crash_signal
        ../../gcc/gcc/toplev.c:327
0x4599ad add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1445
0x459ab9 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1537
0x459ab9 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1537
0x45abc7 lra_update_insn_regno_info(rtx_insn*)
        ../../gcc/gcc/lra.c:1630
0x468985 lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:5077
0x45bc15 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2329
42d463 do_reload
        ../../gcc/gcc/ira.c:5802
0x42d463 execute
        ../../gcc/gcc/ira.c:5988
Please submit a full bug report,

--- ---

during RTL pass: reload
/root/qemu-5.0/linux-user/syscall.c: In function ‘do_syscall1.constprop’:
/root/qemu-5.0/linux-user/syscall.c:12479:1: internal compiler error: Segmentation fault
12479 | }
      | ^
0x56715f crash_signal
        ../../gcc/gcc/toplev.c:327
0x527e35 extract_plus_operands
        ../../gcc/gcc/rtlanal.c:6320
0x52d84b extract_plus_operands
        ../../gcc/gcc/rtlanal.c:6324
0x52d84b decompose_normal_address
        ../../gcc/gcc/rtlanal.c:6373
0x52d84b decompose_address(address_info*, rtx_def**, machine_mode, unsigned char, rtx_code)
        ../../gcc/gcc/rtlanal.c:6474
0x52dbc3 decompose_mem_address(address_info*, rtx_def*)
        ../../gcc/gcc/rtlanal.c:6493
0x463551 process_address_1
        ../../gcc/gcc/lra-constraints.c:3460
0x464c47 process_address
        ../../gcc/gcc/lra-constraints.c:3734
0x464c47 curr_insn_transform
        ../../gcc/gcc/lra-constraints.c:4049
0x468913 lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:5138
0x45bc15 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2329
0x42d463 do_reload
        ../../gcc/gcc/ira.c:5802
0x42d463 execute
        ../../gcc/gcc/ira.c:5988
Please submit a full bug report,

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The canonistack machines I used to crash it (and likely the LP builders) are X-Gene as well.
So we might have a chance to lock this in on specific HW if there are other chip types I could use.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So far 2/4 failed of r11-5879 on X-Gene BareMetal.

Doko asked me to try if I could get these to fail with -j1 as well (in the past I was unable to do so, but it is worth a try).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On BareMetal now also triggered with -j1 (but there were multiple LXD containers each running -j1 to increase the chance to find it).

/root/qemu-5.0/memory.c: In function ‘memory_region_write_accessor’:
/root/qemu-5.0/memory.c:485:1: internal compiler error: Segmentation fault
  485 | }
      | ^
0x56715f crash_signal
        ../../gcc/gcc/toplev.c:327
0x4599ad add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1445
0x459ab9 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1537
0x459ab9 add_regs_to_insn_regno_info
        ../../gcc/gcc/lra.c:1537
0x45abc7 lra_update_insn_regno_info(rtx_insn*)
        ../../gcc/gcc/lra.c:1630
0x468985 lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:5077
0x45bc15 lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2329
0x42d463 do_reload
        ../../gcc/gcc/ira.c:5802
0x42d463 execute
        ../../gcc/gcc/ira.c:5988
Please submit a full bug report,
with preprocessed source if appropriate.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Just FYI - as we were afraid of - this now starts to break SRUs and other service actions to qemu in Groovy. https://launchpad.net/ubuntu/+source/qemu/1:5.0-5ubuntu9.7/+build/21361775 just failed.
And without a better solution I'll need to trigger retry with fingers crossed.

Revision history for this message
In , Rguenth (rguenth) wrote :

GCC 10.3 is being released, retargeting bugs to GCC 10.4.

Changed in gcc:
status: Confirmed → In Progress
Revision history for this message
Oibaf (oibaf) wrote :

Is this still an issue? I was able to only reproduce it on groovy, now EoL.

Revision history for this message
In , Jakub-gcc (jakub-gcc) wrote :

GCC 10.4 is being released, retargeting bugs to GCC 10.5.

Revision history for this message
In , Rguenth (rguenth) wrote :

GCC 10 branch is being closed.

Revision history for this message
In , Pinskia (pinskia) wrote :

*** Bug 112791 has been marked as a duplicate of this bug. ***

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.