Thursday, July 15, 2010

Stack slot collision

I had a frustrating bug reported to me last week -- frustrating in the sense that it was hard to spot the actual problem in the first place and frustrating to fix. The code that we were compiling was quite similar to this, although it wasn't C++:
    Foo::Foo(void *, int, int, float, float, float, FooEnum, const std::string &, bool, bool);

Bar(vector &fooVector) {
for (int i = 0; i < limit; ++i) {
Foo *F = new Foo(NULL, GetNextId(), idx, startTime, timeToLive, velocity, Foo::SOME_ENUM_TYPE, "foo", false, false);
fooVector.push_back(F);
}
}
I was noticing that when fooVector was consumed all the entries in it were zero which was causing a crash as the consuming method attempted to dereference Foo's members. It apparantly contained the number of items, but they were all zero. Since the world wasn't ending and my allocator wasn't returning non-zero values I figured it had to do with our compiler's code generation.

Digging into the assembly generated around the push_back() method, I found this code was being generated:

li r27, 0
...
bl allocate
stw r3, 100(r1)
std r27, 96(r1)
std r27, 88(r1)
...
# The instructions here loaded the registers for the constructor call
...
bl Foo_ctor
lwz r3, 104(r1)
lwz r4, 100(r1)
bl push_back

(This architecture is a variant of PPC64 with its own ABI, for the record, paired with a modified variant of LLVM for code generation.)

Saving the return value of the allocation function to the stack is fairly normal behavior but immediately following this we have a doubleword store that stomps our new pointer! This doesn't affect our call to the constructor because its "this" pointer is always r3 anyways and at that point it is still valid. After returning from the constructor and branching into our push_back method we re-load r3 from the stack which at this time has been clobbered.

Where are these stores coming from? What are they doing? PowerPC ABI's generally pass parameters in registers, unless it runs out of registers in which to pass parameters which it did in this case. Integer types, or those convertible to integers such as our booleans, are converted to register width and then are placed on the stack. Since we were passing two false values to our constructor we stored two zero values to the stack in the parameter passing area.

But why the stomp?

This platform uses a different stack layout than the traditional PPC64 architectures. Specifically, the link register is saved 8 bytes below the backchain pointer rather than 16 bytes above:

This in effect requires that our parameter/linkage area needs to be 8-bytes bigger in order to accomodate the LR's save. We didn't see it before because our compiler's unit tests didn't consider having local variables in slots so close to the save area.

Increasing the size of the parameter/linkage area within the stack frame to account for this intrusion of the link register save was what I did to fix the bug and in the end it was a one line code change for an issue that took days to track down!

No comments:

Post a Comment