Tuesday, September 24, 2013

Defeating stack overflows

I've mentioned in the past that Clojure's stack limitation is a frequent pain point for me. A recent question on the Clojure group got me thinking again about this issue, and this seems like a good opportunity to summarize some of my thoughts on how to work around stack limitations.

To illustrate the problem, consider the following mathematical function:

f(n) = {  1                      if n < 5
       {  f(n-5)+f(floor(n/2))   if n >= 5

I picked this function for no particular reason other than that it is a fairly simple doubly-recursive formula (a la fibonacci), but sufficiently unfamiliar that it's not obvious what the closed formula for this function is. These sorts of functions come up all the time when trying to analyze the performance of complex algorithms; you might need to compute something like f(200000) to figure out how long it will take a certain algorithm to run on a list of 200000 items. There is an entire branch of mathematics devoted to converting functions like this into closed formulas, but for the sake of argument, let's assume that you actually need to compute f(200000).

So let's try the obvious thing:

(defn f [n]
  (if (< n 5) 1 (+' (f (- n 5)) (f (quot n 2)))))

But even (f 2000) starts the CPU churning forever. Well, this is an easy fix, we just memoize.

(def f (memoize f))

(f 2000) now works like a charm, but if you go to (f 20000), you get a stack overflow.

Because of the double recursion, it's quite difficult to express this function with tail recursion and an accumulator. One technique to convert to tail recursion is continuation-passing style, but that doesn't really solve the stack overflow problem. Even though it is tail recursive to build up the continuation, the continuation you build is so deep that you blow the stack as soon as you try to execute it. One possibility is to combine continuation-passing style with trampolining, but that gets hairy real fast. Is there an easier way?

I've touched on two techniques in previous blog posts, but in both posts I was dealing with a simpler recursive function. Lets revisit those techniques in the context of this more challenging example.

Technique 1: Priming the pump

(described in my blog post about the coin kata)

"Priming the pump" is a technique that makes a memoized function behave like a bottom-up dynamic programming algorithm. Put simply, you call the function on all the inputs from 0 up to n, and then return f(n).

However, in our above example, clearly there are a lot of "holes" in terms of what inputs are needed to compute f(200000). If we compute f for every input from 0 up to n, we're doing a lot of extra computation we don't need, and using up memory for our cache that we don't need. In theory, if we weren't limited by the stack, the recursive formulation of f would compute precisely the values that were needed, no more, no less. So we haven't really "defeated the stack problem" unless we can emulate that behavior. The naive "priming the pump" strategy does too much extra work.

So what we need to do is compute the dependencies explicitly. To do this, you need to perform a reverse topological sort. For our purposes, I'm calling the function all-dependencies, but it is a general purpose search that you can use for many purposes, so it's worth keeping an algorithm like this around in your toolbox.

Here's the function:

(defn all-dependencies
  "Takes a dependencies function that maps inputs to a sequence of other inputs that it is dependent upon,
   as well as the initial input n"
  [dependencies n]
  (loop [stack []
         unprocessed [n]
         processed {}]
    (if (empty? unprocessed) stack
      (let [i (peek unprocessed)
            status (processed i)]
        (case status         
          :done (recur stack (pop unprocessed) processed),
          :seen-once (recur (conj stack i) (pop unprocessed) (assoc processed i :done)),
          (recur stack (into unprocessed (dependencies i))
                 (assoc processed i :seen-once)))))))

To apply this function, first we need to come up with the dependencies function that reflects the dependencies of our function f.

(defn f-dependencies [n] (if (< n 5) [] [(- n 5) (quot n 2)]))

Let's see it in action:

=> (all-dependencies f-dependencies 30)
[3 2 7 0 5 10 15 1 6 12 20 25 30]

Notice how computing f for each value in the sequence is dependent only on computing f on the values found to the left in the sequence. So now we can prime the pump with this list of values.

(defn priming-the-pump-f [n]
  (last (map f (all-dependencies f-dependencies n))))

Note that the f in that function needs to be the memoized version of f. That's what makes the priming technique work. At this point, computing f via our new, improved version is no longer bound by the stack.

=> (time (priming-the-pump-f 200000))
"Elapsed time: 369.838633 msecs"

Exercise 1

Consider the pair of functions f and g, where

f(n) = { 1             if n<5
       { f(n-5)+g(n-3) if n>=5

g(n) = { 2                    if n<7
       { f(n-7)+g(floor(n/2)) if n>=7

Compute f(200000).

Hint: Write a function fg-dependencies that you can plug into the all-dependencies function. For example,

=> (all-dependencies fg-dependencies [f 30])
[[g 6] [f 6] [g 13] [g 5] [f 3] [g 10] [f 13] [f 20] [g 27] [f 5] [g 12] [g 4] [f 2] [g 9] [f 4] [f 11] [f 18] [f 25] [f 30]]

(f and g here are meant to be the actual functions f and g, so would actually print in a more verbose manner; I've manually edited the output so the intent is clear. If you prefer to work with symbols or keywords, rather than the functions themselves, that can work also.)

Then adjust the priming technique to interpret these function/input ordered pairs to prime the pump for the memoized f and g functions.

Technique 2: core.async

(described in my blog post about core.async)

In the previous article about using core.async to defeat stack overflow, the example was a simple length function with no double-recursion and no need for memoization. Part of the appeal of the core.async approach was that it allowed us to get rid of the stack limitation with a very minimal transformation of the recursive algorithm. However, we'll find that dealing with these new wrinkles will, unfortunately, require a somewhat deeper transformation. Nevertheless, the transformation is fairly straightforward to implement.

First, because this algorithm requires memoization, we'll need to memoize. Unfortunately, the built-in memoize doesn't seem to work effectively with the go-block strategy, I'm not entirely sure why. We'll have to encode our own memoization strategy directly into the go-blocks.

(def mt (atom {})) ; Memoization table

(defn cache-and-return [in out]
  (swap! mt assoc in out)

With these memoization helper functions in place, we can code a go-block version of our function f

(defn go-f [n]
  (go (if (< n 5) 1
        (let [in1 (- n 5)   ; dependency 1
              out1 (if (contains? @mt in1) (@mt in1)
                     (cache-and-return in1 (<! (go-f in1))))                
              in2 (quot n 2) ; dependency 2
              out2 (if (contains? @mt in2) (@mt in2)
                     (cache-and-return in2 (<! (go-f (quot n 2)))))]
          (+' out1 out2)))))

Finally, we can present our core.async version of f, which takes an input n, kicks off the underlying go block.

(defn async-f [n] (<!! (go-f n)))

=> (time (async-f 200000))
"Elapsed time: 385.745275 msecs"

Note that the two running times are roughly comparable, suggesting that choosing between the two approaches is largely a matter of taste. One advantage of the core.async approach is that you don't have to explicitly code up a dependencies function, it "just works". However, the structure of go-f doesn't cleanly reflect the structure of the original f function, so arguably it is harder to write and be confident that it is correct. I personally prefer technique 1, except for certain classes of recursive functions that really require technique 2 (see exercise 3 below).

Exercise 2

Implement the mutually-recursive f and g functions from Exercise 1, using the core.async technique. Which of the two techniques is less ugly for mutal recursion? Discuss.

Exercise 3

Consider the following function f:

f(x) = { 1                      if n<5
       { f(n-5) + f(floor(n/2)) if n>=5 and f(n-3) is a multiple of 4
       { f(floor(n/3))          if n>=5 and f(n-3) is not a multiple of 4

There is no way to use technique 1 without computing a lot of unnecessary computations. Why? Try implementing the above function using technique 2.

Exercise 4 (advanced)

Implement a library that automatically converts a recursive function into one of the stackless forms given above (either technique 1 or 2).

Other Tricks

Right now, these are my two favorite tricks for defeating stack overflows. I've also resorted at times to a combination of continuations with trampolining, but that's too complicated to discuss here, and I haven't yet decided how well that approach generalizes to complicated recursions.

Are there any other tricks I've missed? Discuss in the comments below.

Written with StackEdit.