Sunday, March 17, 2013

Scheme in 5000 lines of C part 4: (Un)necessary refactorings

Last week I added the first test of the garbage collector which simply allocated memory until the heap was exhausted. Since the garbage collector doesn't do anything now but allocate it was expected that this test would run until the gc_collect function was called where it would hit a breakpoint. The test was simple:


(begin
    (define gc-test (lambda (count param)
        (if (< count 1048576)
            (gc-test (+ 1 count) #(1 2 3 4 5))
            "done")))
    (gc-test 0 0))

It calls itself a million times and allocates a 5-element vector each time. Since the test harness only uses a 1mb heap it should exhaust pretty quickly. (The compiler is pretty stupid currently and doesn't do any sort of optimization that would eliminate this allocation currently).

I ran into one issue first. Up until now, all my tests were testing compilation, so one test case would define a test function and the next would call it. Now, I wanted to do both in a single test case because I'm lazy and don't want to split all my test cases into one where definition happens and another where execution happens.

The system didn't have any ability to handle the (begin ...) syntax. This isn't the end of the world because (begin ...) is pretty easy to implement -- you loop over all the contained forms, evaluating each one, and the result is the value of the last form evaluated -- but it needed to be implemented in two places. Most primitive syntax features like define/set/car/cdr/etc. have an implementation for when they are encountered by the compiler, and an implementation in C for when they are evaluated at the top level. So,

(define foo 3)

and

((lambda () (define foo 3))

would follow two separate implementations. This stemmed from the early roots of "let's see if this will actually work", before there was a compiler.

But I was feeling lazy. I could have implemented begin for both both C code and the VM, but I decided to refactor the evaluation engine so that in the future I only need to implement functionality once. This was done by wrapping every top level function evaluation in an anonymous lambda and ensuring that is run by the VM. The example of (define foo 3) above would be translated into the form below and evaluated.

There were a few other changes that came out of this refactoring. First, procedure object storage is now broken into two parts, a record that stores meta information for the function, the environment in which the function was created, and some function local variable slots. These slots are not local variables in the sense of temporaries during execution but rather storage for objects that the function itself might need to refer to, such as other procedure objects (for closures), raw objects from the parser (ie: from (quote ...)), etc. I think this will make it much easier to tackle.

Second, I had to recognize the eventuality that I'd have to call C functions from the VM and that this new execution model would require it for things like (lambda ...).

Thirdly, I've introduced an object handle. This is meant to be a container that is tracked by the garbage collector so that, if a collection is triggered between two successive calls. Say, for example, you have this code:

struct object_t *foo = gc_alloc(...);
struct object_t *bar = gc_alloc(...);

And the second call to gc_alloc() triggers a collection because the heap was full. If the collector doesn't know about foo and the storage behind that object moves, we're going to be spending weeks finding out what has happened. The handle type is meant to make this more robust by basically placing a read barrier on that reference.

Next up is garbage collection.

(The project is now officially over 6000 lines of content, though only 4200 lines of that is code. That still counts, right?)

Follow on to part 5: Scheme in 5000 lines of C part 5: Collecting garbage
View the source on GitHub: https://github.com/maxburke/evilscheme

No comments:

Post a Comment