Comment 14 for bug 1805256

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote : Re: qemu-img hangs on high core count ARM system

Alright, here is what is happening:

Whenever program is stuck, thread #2 backtrace is this:

(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1 0x0000aaaaaabd41b0 in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at ./util/qemu-thread-posix.c:438
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:442
#3 0x0000aaaaaabee03c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261
#4 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498
#5 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486
#6 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Meaning that code is waiting for a futex inside kernel.

(gdb) print rcu_call_ready_event
$4 = {value = 4294967295, initialized = true}

The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet.

rcu_call_ready_event->value is only touched by:

qemu_event_init() -> bool init ? EV_SET : EV_FREE
qemu_event_reset() -> atomic_or(&ev->value, EV_FREE)
qemu_event_set() -> atomic_xchg(&ev->value, EV_SET)
qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)'

And there should be no 0x7fff value for "ev->value".

qemu_event_init() is the one initializing the global:

    static QemuEvent rcu_call_ready_event;

and it is called by "rcu_init_complete()" which is called by "rcu_init()":

    static void __attribute__((__constructor__)) rcu_init(void)

a constructor function.

So, "fixing" this issue by:

    (gdb) print rcu_call_ready_event
    $8 = {value = 4294967295, initialized = true}

    (gdb) watch rcu_call_ready_event
    Hardware watchpoint 1: rcu_call_ready_event

    (gdb) set rcu_call_ready_event.initialized = 1

    (gdb) set rcu_call_ready_event.value = 0

and note that I added a watchpoint to rcu_call_ready_event global:

<HANG>

Thread 1 "qemu-img" received signal SIGINT, Interrupt.
(gdb) thread 2
[Switching to thread 2 (Thread 0xffffbec61d90 (LWP 33625))]

(gdb) bt
#0 0x0000aaaaaabd4110 in qemu_event_reset (ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>)
#1 0x0000aaaaaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:255
#2 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498
#3 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486
#4 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) print rcu_call_ready_event
$9 = {value = 0, initialized = true}

You can see I advanced in the qemu_event_{reset,set,wait} logic.

    (gdb) disassemble /m 0x0000aaaaaabd4110
    Dump of assembler code for function qemu_event_reset:
    408 in ./util/qemu-thread-posix.c

    409 in ./util/qemu-thread-posix.c

    410 in ./util/qemu-thread-posix.c
    411 in ./util/qemu-thread-posix.c
       0x0000aaaaaabd40f0 <+0>: ldrb w1, [x0, #4]
       0x0000aaaaaabd40f4 <+4>: cbz w1, 0xaaaaaabd411c <qemu_event_reset+44>
       0x0000aaaaaabd411c <+44>: stp x29, x30, [sp, #-16]!
       0x0000aaaaaabd4120 <+48>: adrp x3, 0xaaaaaac20000
       0x0000aaaaaabd4124 <+52>: add x3, x3, #0x908
       0x0000aaaaaabd4128 <+56>: mov x29, sp
       0x0000aaaaaabd412c <+60>: adrp x1, 0xaaaaaac20000
       0x0000aaaaaabd4130 <+64>: adrp x0, 0xaaaaaac20000
       0x0000aaaaaabd4134 <+68>: add x3, x3, #0x290
       0x0000aaaaaabd4138 <+72>: add x1, x1, #0xc00
       0x0000aaaaaabd413c <+76>: add x0, x0, #0xd40
       0x0000aaaaaabd4140 <+80>: mov w2, #0x19b // #411
       0x0000aaaaaabd4144 <+84>: bl 0xaaaaaaaff190 <__assert_fail@plt>

    412 in ./util/qemu-thread-posix.c
       0x0000aaaaaabd40f8 <+8>: ldr w1, [x0]

    413 in ./util/qemu-thread-posix.c
       0x0000aaaaaabd40fc <+12>: dmb ishld

    414 in ./util/qemu-thread-posix.c
       0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 <qemu_event_reset+24>
       0x0000aaaaaabd4104 <+20>: ret
       0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0]
       0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1
    => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0]
       0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 <qemu_event_reset+24>
       0x0000aaaaaabd4118 <+40>: ret

And I'm currently inside the STLXR and LDAXR logic. To make sure my program counter is advancing, I added a breakpoint at 0x0000aaaaaabd4108, so CBNZ instruction would branch indefinitely into LDXAR instruction again, until the
LDAXR<->STLXR logic is satisfied (inside qemu_event_wait()).

(gdb) break *(0x0000aaaaaabd4108)
Breakpoint 2 at 0xaaaaaabd4108: file ./util/qemu-thread-posix.c, line 414.

which is basically this:

    if (value == EV_SET) { EV_SET == 0
        atomic_or(&ev->value, EV_FREE); EV_FREE = 1
    }

and we can see that this logic being called one time after another:

(gdb) c
Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset (
    ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414

(gdb) c
Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset (
    ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414

(gdb) c
Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset (
    ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414

EVEN when rcu_call_ready_event->value is already EV_SET (0):

(gdb) print rcu_call_ready_event
$11 = {value = 0, initialized = true}

(gdb) info break
Num Type Disp Enb Address What
1 hw watchpoint keep y rcu_call_ready_event
3 breakpoint keep n 0x0000aaaaaabd4108 qemu-thread-posix.c:414
        breakpoint already hit 23 times
4 breakpoint keep y 0x0000aaaaaabd4148 qemu-thread-posix.c:424

IF I enable only rcu_call_ready_event HW watchpoint, nothing is triggered.

(gdb) watch *(rcu_call_ready_event->value)
Hardware watchpoint 6: *(rcu_call_ready_event->value)

not if I set it directly to QemuEvent->value...

    assert(ev->initialized);
    value = atomic_read(&ev->value);
    smp_mb_acquire();
    if (value == EV_SET) {
        atomic_or(&ev->value, EV_FREE);
    }

meaning that "value" and "ev->value" might have a diff value... is that so ?

(gdb) print value
$14 = <optimized out>

can't say.. checking registers AND stack:

       0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 <qemu_event_reset+24>
       0x0000aaaaaabd4104 <+20>: ret
       0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0]
       0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1
    => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0]
       0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108

x0 0xaaaaaac87ce8 187649986428136
x1 0x1 1
x2 0x1 1
x3 0x0 0
x4 0xffffbec61e98 281473882398360
x5 0xffffbec61c90 281473882397840
x6 0xffffbec61c90 281473882397840
x7 0x1 1
x8 0x65 101
x9 0x0 0
x10 0x0 0
x11 0x0 0
x12 0xffffbec61d90 281473882398096
x13 0x0 0
x14 0x0 0
x15 0x2 2
x16 0xffffbf67ccf0 281473892994288
x17 0xffffbf274938 281473888766264
x18 0x23f 575
x19 0x0 0
x20 0xaaaaaac87ce8 187649986428136
x21 0x0 0
x22 0xfffffffff5bf 281474976708031
x23 0xaaaaaac87ce0 187649986428128
x24 0xaaaaaac29000 187649986039808
x25 0xfffffffff658 281474976708184
x26 0x1000 4096
x27 0xffffbf28c000 281473888862208
x28 0xffffbec61d90 281473882398096
x29 0xffffbec61420 281473882395680
x30 0xaaaaaabedff8 187649985798136
sp 0xffffbec61420 0xffffbec61420
pc 0xaaaaaabd4110 0xaaaaaabd4110 <qemu_event_reset+32>
cpsr 0x0 [ EL=0 ]
fpsr 0x0 0
fpcr 0x0 0

AND the ORR instruction is ALWAYS being executed against 0x1 (not 0x0, which is what I just changed by changing .value):

(gdb) print value
$14 = <optimized out>

       0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1

#0x1 is being used instead of contents of "value" local variable (volatile).

I'll recompile QEMU flagging all those local "unsigned value" variables as being volatile and check if optimization changes. Or even try to disable optimizations.