Comment 22 for bug 488354

Revision history for this message
In , Jacob Bramley (jacob-bramley) wrote :

> Because it is the only thing that is allocated on stack in the function. But we
> can probably use __builtin_alloca instead if you're uncomfortable with that, it
> generates the same code.

Not necessarily: For one thing, the "register" qualfier is just a hint, but
there may also be compiler temporaries that go on the stack.

Admittedly, for a function of this size and complexity, it's a fairly safe
assumption, but I think perhaps it needs a comment warning others from adding
code without considering the consequences, as it would be dreadfully easy to
break.

Is there any way we can ASSERT this? I don't think there's an easy way, but I
suppose in the assembly snippet we could assert that sp == stack_space before
assigning it:

#ifdef DEBUG
__asm__ __volatile__ (
    "cmp sp, %1"
    "moveq %0, #1"
    "movne %0, #0"
    "add sp, #(4*4)"
    : "=r" (stack_ok)
    : "r" (stack_space)
    : "sp"
);
#else
__asm__ __volatile__ (
    "add sp, #(4*4)" /* sp is at stack_space[0], so advance it to [4].
                             * (This saves &stack_space[4] from being allocated
                             * a register. */
    :
    :
    : "sp"
);
#endif

> I don't know if -fomit-frame-pointer is supposed to work on gcc arm, [...]

It sounds like touching the SP does force frame_needed to 1, so it's probably
safe.

> With a armv5t target, the end changes to the following:
> ldmib r4, {r1, r2} @ phole ldm
> mov lr, pc
> ldr pc, [ip, r6, asl #2]
> sub sp, fp, #28
> ldmfd sp!, {r4, r5, r6, r7, r8, r9, fp, pc}

Ouch. I don't know why the compiler is emiting "ldr pc, [...]". It'd be
preferable to load into a temporary (such as "ip") and then blx to that. Hmm.
What do you get if you specify "-mcpu=cortex-a8"? It's possible that the
compiler is assuming that it doesn't matter for ARMv5T devices (and it's
probably right).

(In any case, this is a compiler-related issue that will occur in all the
compiled code.)

> Without the assembly, this is how the first three 32-bits words are read from
> the stack:
> ldr r3, [sp, #12]
> ldr ip, [r5, #0]
> ldmib sp, {r1, r2} @ phole ldm
>
> In the end, sp would point on the third 32-bits word, instead of the fourth.

Nope, sp isn't updated there. It's a strange way to load the values as a single
ldm would suffice, but I suppose it would work and wouldn't affect performance
much.

----

I just have one final concern, and then I'll leave you alone :-) We need to
guarantee that the stack space is 64-bit aligned (for EABI compliance), but I'm
not sure how you do that. Indeed, you removed some alignment code from
invoke_count_words. Have you confirmed that this is actually correct? Any issues
would only show up with 64-bit arguments (and only then if they don't fall
across an alignment boundary).