Sunday, July 7, 2013

Stackless Clojure with core.async

Let's start off by writing a length function that computes the length of a sequence.

(defn length [l]
  (if (empty? l) 
    0
    (inc (length (rest l)))))

In many dialects of Scheme, this recursive strategy would be the idiomatic way to write the function. However, in Clojure, this gives us a stack overflow:

=> (time (length (range 20000)))
StackOverflowError   clojure.lang.ChunkedCons.more (ChunkedCons.java:48)

Clojure's stack limitation is something I often wrestle with. One idiomatic way to translate this code to Clojure is to write the function accumulator-style, using loop and recur to avoid stack consumption.

(defn accumulator-style-length [l]
  (loop [size 0
         l l]
    (if (empty? l) size
      (recur (inc size) (rest l)))))

But sometimes the translation to accumulator-style is not so straightforward, and I find myself wishing for a way to eliminate stack consumption while preserving the flow of the original code.

When I first read the announcement of core.async, I was struck by the comment that behind the scenes, it is converting go blocks into a combination of state machines and continuations. It occurred to me that this had a lot of similarity to the kind of continuation-passing transform one can do to turn stack-consuming code into tail-recursive code.

Just for fun, I wanted to see if core.async could be used to preserve the shape of a recursive function, while removing the stack limitation. Here's what I came up with:

(use 'clojure.core.async)

(defn go-length [l]
  (go (if (empty? l)
        0
        (inc (<! (go-length (rest l)))))))

(defn length [l]
  (<!! (go-length l)))

Notice that go-length follows the shape of the original recursive length function. We've wrapped the recursive call in <! and wrapped the entire body of the function in a go block. This function alone, however, will not produce the result we need. So we use this as a helper function to the length function which simply extracts the value from the call to go-length.

=> (time (length (range 20000)))
"Elapsed time: 54.570256 msecs"
20000

Hallelujah! It works. No stack overflow. Let's compare against the accumulator version:

=> (time (accumulator-style-length (range 20000)))
"Elapsed time: 1.500303 msecs"
20000

OK, so we're not going to win any performance awards with our core.async version of length, which means that this little stunt will probably remain more in the realm of "nifty little trick" than "practical way to code recursive functions", but I still think it's very cool that it works.