Friday, July 23, 2010

Know Your Compiler: Visual C++'s /showIncludes

If you're trying to track down exactly what a particular compilation unit is including (ie: to publicly shame those who are including Windows.h in a public header, or whatever your reasons may be), Visual C++ provides a handy means of doing so with the /showIncludes command line option.

Patch submission

Working in a central tech development role I have patches sent to me on a regular basis, patches for which I am grateful to receive because being able to adopt the fixes my users make only improves my products.

That said, here is my wishlist for patch submission nirvana:
  1. Describe the problem the patch is meant to solve. Maybe a patch isn't entirely necessary? No matter what, dropping a handful of files on my lap with no explanation as to what they do is going to waste both my time and yours as I will need to figure out what's going on, and you're going to be helping!
  2. Please send diffs, not whole files.
  3. Use unified diffs (diff -u or, with Perforce, p4 diff -du). Non-unified diffs are very hard to follow context-wise. The old-style Perforce visual diff that stylizes the old text with bold, red, and struck out and the new text with bold blue is even worse. Providing a patch set or a Git branch would be the Bestest(tm).
  4. If it isn't possible to provide a Git branch then at least indicate from which version of the library the patch was written against.
  5. If I reject the patch, it's not because I don't like you, it may be that the patch conflicts with design goals of the library or may be tied too closely to the submitter's own parent project. This code serves many people and I need to ensure that no one is hindered by the application of this patch.
Thanks, in advance :)

Friday, July 16, 2010

Of false dilemma's and C++

Periodically some e-famous programmer will rant about the evils of C++. This time it was Zed Shaw, though in the past Linus Torvalds has had a tilt at this windmill. These posts are always an entertaining read but what I find interesting in these discussions are the follow up discussions on sites like Reddit.

C++ is a complex language, no doubt about it, but it is entirely possible to manage the complexity. This has been a stated design goal of the language, any new feature introduced should be at no cost to the user should they not want to use it.

Several of these posters offer up exceptions as a case where C++ needlessly burdens the user, compared to its progenitor C. How do you manage memory in an environment that uses exceptions? I'm not going to lie, handling exceptions properly is a hard problem, so much so that it seems that half of Herb Sutter's Guru of the Week articles are on the subject of writing exception safe code. I'm going to go out on a limb here and say that it really doesn't matter how your application deals with exceptions because when one is thrown your system is in all likelihood hosed.

Imagine that you are in the land of C, free from classes and templates, how would you handle a scenario which could be a candidate for a thrown exception? Say malloc returns a null pointer (if that is a valid option, and your process isn't just killed by the operating system), what do you do? Realistically your program would bomb out.

So, now that C++ allows you to handle such a scenario, do you need to do anything about it?

No.

Let it fail, maybe print out a stack trace if you're feeling generous. Sleep well in the comfort that you will likely not have to roll your own data types and that your program is no less robust than if it were written in C.

Thursday, July 15, 2010

Stack slot collision

I had a frustrating bug reported to me last week -- frustrating in the sense that it was hard to spot the actual problem in the first place and frustrating to fix. The code that we were compiling was quite similar to this, although it wasn't C++:
    Foo::Foo(void *, int, int, float, float, float, FooEnum, const std::string &, bool, bool);

Bar(vector &fooVector) {
for (int i = 0; i < limit; ++i) {
Foo *F = new Foo(NULL, GetNextId(), idx, startTime, timeToLive, velocity, Foo::SOME_ENUM_TYPE, "foo", false, false);
fooVector.push_back(F);
}
}
I was noticing that when fooVector was consumed all the entries in it were zero which was causing a crash as the consuming method attempted to dereference Foo's members. It apparantly contained the number of items, but they were all zero. Since the world wasn't ending and my allocator wasn't returning non-zero values I figured it had to do with our compiler's code generation.

Digging into the assembly generated around the push_back() method, I found this code was being generated:

li r27, 0
...
bl allocate
stw r3, 100(r1)
std r27, 96(r1)
std r27, 88(r1)
...
# The instructions here loaded the registers for the constructor call
...
bl Foo_ctor
lwz r3, 104(r1)
lwz r4, 100(r1)
bl push_back

(This architecture is a variant of PPC64 with its own ABI, for the record, paired with a modified variant of LLVM for code generation.)

Saving the return value of the allocation function to the stack is fairly normal behavior but immediately following this we have a doubleword store that stomps our new pointer! This doesn't affect our call to the constructor because its "this" pointer is always r3 anyways and at that point it is still valid. After returning from the constructor and branching into our push_back method we re-load r3 from the stack which at this time has been clobbered.

Where are these stores coming from? What are they doing? PowerPC ABI's generally pass parameters in registers, unless it runs out of registers in which to pass parameters which it did in this case. Integer types, or those convertible to integers such as our booleans, are converted to register width and then are placed on the stack. Since we were passing two false values to our constructor we stored two zero values to the stack in the parameter passing area.

But why the stomp?

This platform uses a different stack layout than the traditional PPC64 architectures. Specifically, the link register is saved 8 bytes below the backchain pointer rather than 16 bytes above:

This in effect requires that our parameter/linkage area needs to be 8-bytes bigger in order to accomodate the LR's save. We didn't see it before because our compiler's unit tests didn't consider having local variables in slots so close to the save area.

Increasing the size of the parameter/linkage area within the stack frame to account for this intrusion of the link register save was what I did to fix the bug and in the end it was a one line code change for an issue that took days to track down!