Comment 6 for bug 604872

Revision history for this message
Peter Maydell (pmaydell) wrote :

I've analysed this segfault. The problem is that we're not correctly taking account of the IT state on entry to a Thumb translation block if we're retranslating it for cpu_restore_state().

The offending TB here is:
0x0003dc00: movle r2, #0
0x0003dc02: ldr r1, [pc, #644] (0x3de88)
0x0003dc04: cmp r3, #2
0x0003dc06: str r2, [r1, #0]
0x0003dc08: it eq
0x0003dc0a: ldreq r3, [r5, #8]
0x0003dc0c: beq.w 0x3ddce

where the 'le' is because the TB before that ended with an 'it le'. When we execute this the str gets a data abort. qemu handles this by calling cpu_restore_state(), which reruns the translation process but this time generating a mapping between target and host addresses, so we can turn the host PC of the fault into a target PC. Unfortunately we retranslate without taking account of what the IT state at the start of the TB should have been:

0x0003dc00: movs r2, #0
0x0003dc02: ldr r1, [pc, #644] (0x3de88)
0x0003dc04: cmp r3, #2
0x0003dc06: str r2, [r1, #0]
0x0003dc08: it eq
0x0003dc0a: ldreq r3, [r5, #8]
0x0003dc0c: beq.w 0x3ddce

...note that that mov has become unconditional. (It's not just the disassembly, the generated intermediate code changes too.)
Since cpu_restore_state() works by (a) actually rewriting the translated code into the buffer and (b) stopping when we get to the PC which faulted, this means we end up writing over the old generated code with half of a different version of the generated code. This is never going to go well, and we end up jumping off into the weeds the next time we execute the TB.

I think this is related to but not the same as https://bugs.launchpad.net/qemu/+bug/581335.