Sunday, March 31, 2013

Scheme in 5000 lines of C part 5: Collecting garbage

I was hoping to get this post up sooner but the last week and a bit has been complicated by my appendix exploding. I'm on the upswing though and now am starting to think well enough to be able to dig through my Scheme todo list.

Last time this series left off I began working on the garbage collector by adding a test to exhaust the heap and then re-wrote the evaluation engine. So I didn't actually do any work on the garbage collector.

A few years ago I worked on a small team that built a .NET environment for video game consoles which helped spark my interests in compilers and runtime environments. We tried a number of types of garbage collectors while trying to hit our performance goals, from simple non-precise mark/sweep, to Boehm, to several different types of precise collectors. Two designs we had the most success with, one was a half-space (Cheney) collector, and the other final design was based on an algorithm somewhat similar to Clinger's non-predictive generational collector.

At points I have experimented with a generational collector but I found that they work best when you can expand the tenured generation as objects are copied into it. As this isn't really feasible in a video game console where your memory budget is fixed and there's no pagefile to fall back on they never made the cut.

What I found while profiling with the games that were being built on this .NET runtime is that objects either died really quickly or they persisted forever (ie: more than a few frames). By breaking the heap into many small (typically page-sized) buckets, often when a collect was triggered there would be enough buckets that no longer had any live objects that the collect process could finish without having to move/compact the heap at all. This was a huge advantage to the half-space collector that we had used previously because all live objects would be copied in every collection.

What does this have to do with Scheme in 5000 Lines of C? The collector used in this scheme is going to use some similar ideas. For example, it's going to use small, fixed sized buckets. It's also going to compact the heap only if it has to. Luckily this garbage collector doesn't have to worry about the complications that a collector for .NET needs, like finalizers and threading.

I've got the initial root-set scan and collection of empty buckets implemented. The object handles I mentioned last time are now quite important because they act as read-barriers for code outside of the system so that they get an object's correct location if it moves underneath them. It's a bit cumbersome to use in C for now but this is a trade-off I'm making in order to make the collector fast, and enforce to the user (me right now) that it's a bad idea to peek and poke stuff in the scripting system willy-nilly.

One of the most hair tearing-out-ly problems I've found when writing a precise garbage collector is making sure that you do not miss any roots. If you do you'll eventually collect an object's space before it's dead and then you're in world of hurt. The read barrier is an easy way to make sure you do not miss any roots if you do not have the luxury of being able to control code generation so as to generate stack frame maps.

The reader also got a lot of attention today. Sorry, not you :), but the core of the code called when the user calls (read ...).I've weaned it off of the CRT allocation functions and it now allocates all of its intermediate structures in the GC heap. This has been on my todo list for a while.

The source tree was also reworked so as to be a little more friendly for use as a library -- a public include folder was created separate from the library source code.

I've got a few things up next to take care of. I want to start working on closures soon as well as heap compaction. Maybe before the end of the month I can get a few of the Computer Language Benchmark Game benchmarks written and added to the test suite as well.

View the source on GitHub: https://github.com/maxburke/evilscheme

Sunday, March 17, 2013

Scheme in 5000 lines of C part 4: (Un)necessary refactorings

Last week I added the first test of the garbage collector which simply allocated memory until the heap was exhausted. Since the garbage collector doesn't do anything now but allocate it was expected that this test would run until the gc_collect function was called where it would hit a breakpoint. The test was simple:


(begin
    (define gc-test (lambda (count param)
        (if (< count 1048576)
            (gc-test (+ 1 count) #(1 2 3 4 5))
            "done")))
    (gc-test 0 0))

It calls itself a million times and allocates a 5-element vector each time. Since the test harness only uses a 1mb heap it should exhaust pretty quickly. (The compiler is pretty stupid currently and doesn't do any sort of optimization that would eliminate this allocation currently).

I ran into one issue first. Up until now, all my tests were testing compilation, so one test case would define a test function and the next would call it. Now, I wanted to do both in a single test case because I'm lazy and don't want to split all my test cases into one where definition happens and another where execution happens.

The system didn't have any ability to handle the (begin ...) syntax. This isn't the end of the world because (begin ...) is pretty easy to implement -- you loop over all the contained forms, evaluating each one, and the result is the value of the last form evaluated -- but it needed to be implemented in two places. Most primitive syntax features like define/set/car/cdr/etc. have an implementation for when they are encountered by the compiler, and an implementation in C for when they are evaluated at the top level. So,

(define foo 3)

and

((lambda () (define foo 3))

would follow two separate implementations. This stemmed from the early roots of "let's see if this will actually work", before there was a compiler.

But I was feeling lazy. I could have implemented begin for both both C code and the VM, but I decided to refactor the evaluation engine so that in the future I only need to implement functionality once. This was done by wrapping every top level function evaluation in an anonymous lambda and ensuring that is run by the VM. The example of (define foo 3) above would be translated into the form below and evaluated.

There were a few other changes that came out of this refactoring. First, procedure object storage is now broken into two parts, a record that stores meta information for the function, the environment in which the function was created, and some function local variable slots. These slots are not local variables in the sense of temporaries during execution but rather storage for objects that the function itself might need to refer to, such as other procedure objects (for closures), raw objects from the parser (ie: from (quote ...)), etc. I think this will make it much easier to tackle.

Second, I had to recognize the eventuality that I'd have to call C functions from the VM and that this new execution model would require it for things like (lambda ...).

Thirdly, I've introduced an object handle. This is meant to be a container that is tracked by the garbage collector so that, if a collection is triggered between two successive calls. Say, for example, you have this code:

struct object_t *foo = gc_alloc(...);
struct object_t *bar = gc_alloc(...);

And the second call to gc_alloc() triggers a collection because the heap was full. If the collector doesn't know about foo and the storage behind that object moves, we're going to be spending weeks finding out what has happened. The handle type is meant to make this more robust by basically placing a read barrier on that reference.

Next up is garbage collection.

(The project is now officially over 6000 lines of content, though only 4200 lines of that is code. That still counts, right?)

Follow on to part 5: Scheme in 5000 lines of C part 5: Collecting garbage
View the source on GitHub: https://github.com/maxburke/evilscheme

Monday, March 11, 2013

Scheme in 5000 lines of C part 3: Testing, random thoughts on closures.

I haven't made as much progress on the Scheme recently as I would have liked. Work's been busy, the house has needed some fixing, and I'm still chewing over how I want to implement closures.

Since I want to make *some* progress I rewrote the testing framework. Before I just had some sample strings embedded in main that I read-evaled-printed, and looked at the output to see if the changes I made had the desired effect.

I felt some (a lot of) shame because (semi-)robust testing is supposed to be my day job. So I moved most of that away into a test harness that's pretty stupid/simple but actually performs validation. Funny that.

There's a directory, tests/, that contains the test cases. One per file. Each file has the input that is fed to read-eval-print, the output of which is captured, and then compared to the expected value.

I'm still not quite sure how I'm going to handle closures though. Variable capture I think is going to be fairly easy. I'm not quite sure how I'm going to store references to inner functions though. For example, this function:


(define closure-test
  (lambda (foo)
    (lambda ()
      (set! foo (+ 1 foo))
      foo)))

The outer function (closure-test) will need to hold some sort of managed reference to the inner function so that when closure-test is called it can create a new function object and a new environment, but I'm not quite sure if I want to bake the reference into the bytecode, or do something else. Maybe re-write function objects to include a table of contained functions? That might work.

Follow on to part 4: Scheme in 5000 lines of C part 4: (Un)necessary refactorings
View the source on GitHub: https://github.com/maxburke/evilscheme