Monday, January 28, 2013

Scheme in 5000 lines of C

Every time I see a blog post spring up talking about a new (basic) implementation of Scheme that is written in 30 lines of Ruby, Javascript, or Python I get the sense that there's something missing.

In fact, there's a lot that's missing.

These environments already come with a garbage collector, so the implementation never describes implementation. The intricacies and tradeoffs of object storage? Also skipped over. Parsing? Nope. What about compilation? Never done.

I appreciate code golf and it's fun to see these new, super small, implementations show up, and don't mean to belittle the effort their developers invested, however I haven't been able to learn much from them. So, I thought I'd give it a go on my own.

The hard way.

In C.

Not that cutesy C with C++ comments, variable-declaration-wherever-you-want-it, or inline functions, but the super crusty C89 supported by Microsoft Visual C++, the one where Dave Cutler personally wrote the front end himself while fighting off the red menace and arm wrestling bears. (And GCC/clang too... I guess. Twist my rubber arm.)

I've got a bunch of stuff cobbled together now where it sorta kinda works. It parses, it generates an AST, it compiles the AST into a bytecode, and the bytecode runs. Hey, that almost makes it more functional than some video game console toolchains I've used.

One decision I made recently which simplified the VM conceptually was creating value types (fixnum/flonum/bool/char/reference/inner reference) from reference types (cons cell/vector/string/function). This allowed the VM's execution stack to contain either a raw value, like a fixnum, or a reference to a heap-allocated object, making stack operations simple.

Originally I had it so that the storage of conses was handled by holding a pair of pointers to other objects but I recently switched this so that a cons is now a vector of length two. This means that value types do not need to be boxed in order to be stored in a cons and also simplifies the operations on it.

I've got a few compiler-related tasks I'm going to work on now that it's executing, such as handling specific essential syntax forms. I've got a garbage collector to do as well, but I'm not worried about that one as I've already written several, including one that shipped in a console video game.

If you want to follow along, I've got my code stashed on GitHub. It's still pretty rough around the edges though, so beware at this point!

Maybe when (or if!) I finish, it'll actually be under 5000 lines of C :-)

Follow on to part 2: Scheme in 5000 lines of C part 2: (let ((them (eat 'cake))))