Wednesday, March 6, 2013

Logic programming is overrated

Logic programming is experiencing something of a resurgence in the Clojure community, sparked by core.logic, a port of miniKanren, an embedding of Prolog in Scheme. When Clojure programmers first hear about core.logic, they go rushing off for a problem to solve like a hammer in search of a nail.

This means that every few months, we see a new blog post demonstrating how to use core.logic to solve a logic puzzle. The latest one to catch my attention was this blog post by Kris Jenkins detailing a logic puzzle tackled by the London Clojure group. The blog post begins by describing the puzzle and says:

It cries out, "Logic Programming", doesn't it?

NO, NO, NO!!!! It doesn't cry out logic programming, for reasons I'll get into in a moment. The blog post explains that the London group was unable to solve the puzzle in a single session, and goes on to explain the complications they ran into -- for example, negation rules and rules involving inequalities prove harder than expected to encode. They also needed to introduce macros to make it more readable. You'd think that after detailing all these problems, the post would end with the observation that logic programming turned out to be a convoluted way to express the logic puzzle, and failed to live up to expectations; yet amazingly, it does not.

The reason why logic programming is so rarely useful is that, essentially, core.logic is just a complex DSL for doing exhaustive search. Clojure already has an elegant, compact DSL for doing exhaustive search -- it is called the for comprehension.

So let's take a look at just how much more cleanly and easily this logic puzzle can be solved just using Clojure's built-in for comprehension. Take a look at this code. To run it, you'll need to:
(use 'clojure.math.combinatorics)
for the permutations function.

As you read the code, I want you to look at how clear and concise the translation is from the English constraints to Clojure. You've got a few lines setting up the variables, and then every single line of Clojure code is a straightforward expression of the original problem statement. Notice how trivial it is to handle things like negation, ordering, and inequalities.

Now go back and read the core.logic version. Admit it, it's ugly in comparison.

I think a lot of people mistakenly believe that core.logic is working some magic behind the scenes to solve puzzles in some amazingly efficient manner. On the contrary, it's just brute force search, using complex machinery which just slows it down versus a for comprehension. Not only is the for version elegant, but it's also faster.

But the disadvantages of core.logic for solving logic puzzles don't end there. core.logic is very sensitive to the way that goals are ordered, in a bad way. Certain orderings, for example, will cause the program to go into an infinite loop, and the DSL is complex enough that this is not always readily apparent.

The for comprehension version is also sensitive to ordering, but in a good way. In the for version, the code is evident and there is no mystery about how it will execute, so it is a trivial matter to rearrange the rules to improve the program's running time. The rule of thumb is simple: move each constraint up so that it occurs just after the definition of any variables it depends on. Here is a version where the rules have been shifted around in this straightforward way, making the search run ten times faster.

So all this is to say that I think logic programming is overrated, at least for solving logic puzzles. I write logic puzzles for a living, and I haven't yet found a practical use for core.logic. So far, every blog post I've seen that demonstrated core.logic on a logic puzzle could have been solved just as easily or better with a for comprehension.

Does that mean core.logic is useless? Of course not. Logic programming is good for running programs backwards (I've seen some wonderful toy examples of this, like the program that generates Quines, but have yet to find a real-world example where this is useful), unification is important for writing type checkers, and the new constraint programming piece of core.logic based on cKanren has wonderful potential to be directly useful to me for the kinds of things I program (although probably not until it comes with tools for guiding and visualizing the search strategy).

So the purpose of this article is not to demean core.logic, but rather, to elevate the level of discourse surrounding it. Can we move past solving logic puzzles which would be better solved with a for comprehension? Please, show me things I can do with core.logic that would be hard to express any other way. I look forward to it!

Monday, November 19, 2012

Coin Change Kata in Racket and Clojure

There's a Code Kata that is being discussed in the Clojure community right now -- the Coin Change Kata.  As someone who programs regularly in both Racket and Clojure, I thought it would be fun to demonstrate how to tackle this problem in both languages, highlighting the differences.  The Racket version is a little more straightforward, so let's begin with that.

The basic idea of the Coin Change Kata is that you are given a list of coin denominations and an amount of money, and you need to find the way to achieve that amount of money in the smallest amount of coins.  For example, if you want to make 18 cents out of standard U.S. coin denominations, the best way would be one dime, one nickel, and three pennies.  The official description of the kata gives some latitude about how to express the output of your function.  I think a dictionary (aka hash table) is a perfectly sensible way to do it.

So, in Racket terminology, we're looking for a function that behaves like this:
(make-change '(1 5 10 25) 18) produces #hash((1 . 3) (5 . 1) (10 . 1))

Most beginners, when faced with this problem, immediately reason about the problem using the most obvious example they are familiar with: U.S. coin denominations.  In our experience counting out money, we know that the best strategy is to first use as many quarters as possible, then as many dimes as possible, then as many nickels as possible, and finally, finish off with pennies.

So the strategy seems straightforward.  Sort the denominations in descending order, then work your way down the list, using as many as possible of the big coin denominations first.  Many of the solutions that others have provided for this kata employ this strategy, known as the "greedy strategy".  Those solutions are wrong. 

To see this, consider (make-change '(1 20 25) 80).  If you start by counting out quarters, you'll get to 75 cents, and then you make up the difference with pennies, for a total of 8 coins.  Clearly, you're better off just using four 20 cent coins.  So a more thorough search algorithm is required.

There's one other confounding factor that is often overlooked by beginners -- what if the problem is impossible?  For example, what should (make-change '(2 10) 13) return?  With that in mind, let's refine our contract for make-change and say that it either returns a dictionary of how to make change, or it returns false if the problem is impossible.

Before we get into the meat of the problem, there are a couple helper functions that we're going to want to have.  First, it seems clear that a big part of what we'll be doing is comparing solutions to see which one uses the fewest coins.  So we need a way to count the coins in a "change dictionary".

;; count-coins: dict -> nat
;; counts how many coins are in the change dictionary
(define (count-coins change)
  (apply + (dict-values change)))


It would also be helpful to have a way to increment the coin count of a single denomination in a change dictionary.  For example,
(change-increment #hash((10 . 1) (25 . 2)) 25) gives #hash((10 . 1) (25 . 3))

This is also straightforward:
;; change-increment: dict nat -> dict
;; Takes a change dictionary and a coin, returns the 
;; change dictionary with that coin's count incremented
(define (change-increment change coin)
  (dict-update change coin add1 0))


Now we're ready to dive in.  I am aware of two basic recursive strategies for tackling this problem.

Strategy 1:  Include the first coin denomination, or not

To illustrate this idea, let's go back to U.S. denominations.  If I want to find the best way to make 15 cents with the denominations '(1 5 10 25), I have two scenarios to consider.

The first scenario is to include the first denomination, i.e., find the best way to make 14 cents with '(1 5 10 25) and then increment the penny count of that solution by 1.  The best way to make 14 cents with '(1 5 10 25) is #hash((1 . 4) (10 . 1)).  Increment the penny count, and you get #hash((1 . 5) (10 . 1)).  So this is one of our candidate solutions.

The second scenario is that I ignore the penny completely and find the best way to make 15 cents out of '(5 10 25).  The best way to do this is #hash((5 . 1) (10 . 1)).

In this example, the second scenario yields the superior solution, so that's the answer.

An outline of this algorithm would look like this:
(define (make-change denominations amount)
  (cond
    ;; Insert base cases here 
    [else
     ;; Case 1: Use the first denomination
     (define option1 (change-increment
                      (make-change denominations 
                                   (- amount (first denominations)))
                      (first denominations)))
     ;; Case 2: Or not
     (define option2 (make-change (rest denominations) amount))
     ;; Which is best?
     (argmin count-coins (list option1 option2))]))


It's not entirely obvious what the base cases are, but if you have experience with recursive algorithms, you can see that this algorithm either reduces the length of the denomination list, or reduces the target amount of money with each step.  So there are really two separate base cases to consider: either the denomination list gets to empty or the amount reaches zero (or maybe even becomes negative!).

Filling in the base cases looks like this:
(define (make-change denominations amount)
  (cond
    [(negative? amount) false]
    [(zero? amount) #hash()]
    [(empty? denominations) false]
    [else
     (define option1 (change-increment
                      (make-change denominations (- amount (first denominations)))
                      (first denominations)))
     (define option2 (make-change (rest denominations) amount))
     (argmin count-coins (list option1 option2))]))


Uh oh.  There's a problem here.  make-change can potentially return false, and the helper functions we created (i.e., change-increment and count-coins) don't handle false values.  There are several possible ways to tackle this, but I think the simplest way is to just modify the helper functions to handle the false value gracefully.

change-increment is easy to modify -- false in should mean false out.
;; change-increment: dict-or-false nat -> dict-or-false
(define (change-increment change coin)
  (if change
      (dict-update change coin add1 0)
      false))


It's a little less obvious how to modify count-coins.  However, we're using count-coins within an argmin in order to find the solution with the fewest coins.  To make this work, we just need to ensure that when count-coins is passed a false value, the output is something that can never be a "minimum".  A standard trick to accomplish this is to set the output to "infinity".

;; count-coins: dict-or-false -> nat-or-infinity
(define (count-coins change)
  (if change
      (apply + (dict-values change))
      +inf.0))


Now, we have a working make-change function that passes all test cases.  Unfortunately, as you test the make-change function with larger and larger amounts, it gets really, really sloooooow.  Functional programmers know that there's a simple trick to get around this -- memoization.  A quick and easy way to add memoization is to add the line:
(require (planet dherman/memoize:3:1))
to the top of your file and change define to define/memo in the definition of make-change.

[The first time you include the memoization library, Racket will churn for several minutes and print out hundreds of error messages as it downloads and compiles the library locally on your machine.  After that, the library will be available on your system, and future inclusions will be speedy.]

The final Racket version of Strategy 1:
#lang racket
(require rackunit)
(require (planet dherman/memoize:3:1))

;; count-coins: dict-or-false -> nat-or-infinity
;; counts how many coins are in the change dictionary
;; or returns infinity if input is false
(define (count-coins change)
  (if change
      (apply + (dict-values change))
      +inf.0))
  
;; change-increment: dict-or-false nat -> dict-or-false
;; Takes a change dictionary and a coin, returns the 
;; change dictionary with that coin's count incremented
;; or false if change input is false
(define (change-increment change coin)
  (if change
      (dict-update change coin add1 0)
      false))

;; make-change: list-of-nats nat -> dict-or-false
;; Takes a list of coin denominations and an amount that
;; needs to be broken up into change.  Returns a
;; "change dictionary", i.e., a hash table that maps
;; coin denominations to counts, corresponding to the 
;; most efficient way that sums to the desired amount.
;; Returns false if impossible.
(define/memo (make-change denominations amount)
  (cond
    [(negative? amount) false]
    [(zero? amount) #hash()]
    [(empty? denominations) false]
    [else
     (define option1 (change-increment
                      (make-change denominations (- amount (first denominations)))
                      (first denominations)))
     (define option2 (make-change (rest denominations) amount))
     (argmin count-coins (list option1 option2))]))
       
(check-equal? (make-change '(1 5 10 25) 18)
              #hash((1 . 3) (5 . 1) (10 . 1)))
(check-equal? (make-change '(1 20 25) 80)
              #hash((20 . 4)))
(check-equal? (make-change '(1 24 25) 98)
              #hash((24 . 2) (25 . 2)))
(check-equal? (make-change '(2 10) 13) false)



Strategy 2: Which coin to use next?

Going back to our example of finding the best way to make 15 cents out of '(1 5 10 25), there's a completely different strategy we could take.  The first coin I use could either be a penny, nickel, dime, or quarter.  So if I knew the best way to make 14 cents, 10 cents, 5 cents, and -10 cents out of '(1 5 10 25) [note that these values are 15-1, 15-5, 15-10, and 15-25, respectively], then I take the best of those ways, and increment the count of the corresponding coin to get the best way to make 15 cents.

Using the same helper functions as Strategy 1, here is the final Racket implementation of make-change, Strategy 2:
(define/memo (make-change denominations amount)
  (cond
    [(negative? amount) false]
    [(zero? amount) #hash()]
    [(empty? denominations) false]
    [else
     (define options (for/list ([coin (in-list denominations)])
                       (change-increment
                        (make-change denominations (- amount coin))
                        coin)))
     (argmin count-coins options)]))


In my benchmarking, the Racket version of Strategy 1 performed 10x better than the Racket version of Strategy 2 for large target amounts.


Clojure versions

Now, let's take a look at the Clojure versions of the same code.  We'll begin with Strategy 1.  In Clojure, hash maps are written using curly braces and we can insert commas anywhere we like to increase readability, so the change dictionaries look like {1 3, 5 1, 10 1}.  Keeping with Clojure's conventions, the input list of denominations is now a vector rather than a list.  Also, in Clojure we use nil as the output when there is no solution, rather than false. Overall, the code is almost the same (although this is not quite the final version):

(use 'clojure.test)

(defn count-coins [change]
  (if change
    (apply + (vals change))
    Double/POSITIVE_INFINITY))

(defn update [m k f default]
  (assoc m k (f (get m k default))))

(defn change-increment [change coin]
  (when change
    (update change coin inc 0)))

(defn make-change [denominations amount]
  (cond
   (neg? amount) nil
   (zero? amount) {}
   (empty? denominations) nil
   :else
   (let [option1 (change-increment
                  (make-change denominations (- amount (first denominations)))
                  (first denominations)),
         option2 (make-change (rest denominations) amount)]
     (min-key count-coins option1 option2))))

(def make-change (memoize make-change)) 

(deftest make-change-tests
  (are [x y] (= x y)
       (make-change [1 5 10 25] 18)  {1 3, 5 1, 10 1}
       (make-change [1 20 25] 80)    {20 4}
       (make-change [1 24 25] 98)    {24 2, 25 2}
       (make-change [2 10] 13)       nil))

Let's enumerate some of the differences:
  1. count-coins is essentially the same -- infinity has a longer name.
  2. Clojure does not have a built-in update function for hash-maps, so we have to roll our own.  Frankly, I find this to be one of the most glaring omissions in Clojure's core library.  Given the richness of Clojure's core, I find it baffling that Clojure is up to version 1.5, and this function still has not been added.
  3. In change-increment, we can leverage the fact that Clojure's when returns nil when the condition is false/nil, thus saving a line relative to Racket.  This is a common idiom in Clojure for writing "nil in means nil out" functions.
  4. No internal define in Clojure, so instead in the else clause, we use let (which is similar to Racket's let*).
  5. In Racket, argmin takes a function and a list.  In Clojure, the corresponding function is min-key which takes a function and a variable number of arguments. 
  6. memoize is built-in to Clojure.  The above technique of defining the function normally and then rebinding the function name to the memoized version is a common idiom, somewhat different than Racket's define/memo.
Not a whole lot of differences.  Even the unit test code looks similar.  However, there's a very big difference lurking beneath the surface: if you try make-change on a large target amount, you get a StackOverflow error.  This is a definite problem.

Racket uses a very clever technique to simulate an unlimited stack (or more to the point, limited only to the overall memory you've allocated to Racket, rather than some arbitrary stack limit).  My understanding is that Racket achieves this trick by catching the error thrown when the stack overflows, moving the stack to the heap, and continuing.  Even if those details aren't quite right, the main point here is that in Racket, you just don't spend any time worrying about stack overflows.  In Clojure, it's a very salient issue that you absolutely must contend with.

So how to deal with it?  One option is to switch the code over to memoization's bottom-up cousin, dynamic programming.  The idea is that you allocate an array large enough to store the computations of make-change for all possible values up to amount, and fill up this array in sequence using the values that have been computed before.  This will solve our stack overflow problem, but requires writing the code in a very different style.  It would be nice if we could solve the problem without changing much of our existing code.

Any other options?  How about we do a CPS transform on the entire program so that it doesn't consume any stack space, only heap?  Blech.

Fortunately, there is a quick trick that I like to call "priming the pump".  We just call the memoized function on all numbers up to the target amount, achieving a similar bottom-up effect as the dynamic programming approach in a way that allows us to build off our existing code.  For clarity, I've left the original function alone, and created a new prime-the-pump version called make-change-fast.  This new function, make-change-fast, is the one we would expose publicly; the original make-change function would become private.

Final Clojure version of Strategy 1:
;;Basic helper functions remain unchanged

(defn count-coins [change]
  (if change
    (apply + (vals change))
    Double/POSITIVE_INFINITY))

(defn update [m k f default]
  (assoc m k (f (get m k default))))

(defn change-increment [change coin]
  (when change
    (update change coin inc 0)))

;; This is now the private helper function, don't use this directly
(defn ^:dynamic make-change [denominations amount]
  (cond
   (neg? amount) nil
   (zero? amount) {}
   (empty? denominations) nil
   :else
   (let [option1 (change-increment
                  (make-change denominations (- amount (first denominations)))
                  (first denominations)),
         option2 (make-change (rest denominations) amount)]
     (min-key count-coins option1 option2))))

;; This is the new public function we use to make change. 
(defn make-change-fast [denominations amount]
  (binding [make-change (memoize make-change)]
    (last
     (for [i (range (inc amount))]
       (make-change denominations i)))))
When writing make-change-fast, I decided to demonstrate an alternative memoization idiom in Clojure.  Rather than rebinding make-change with (def make-change (memoize make-change)), we can declare make-change to be a dynamic var, and then in make-change-fast, we achieve memoization within a binding construct. Why do it this way?  Well, I actually prefer this idiom for this particular use case because it means that every make-change-fast creates its own memoized version of make-change with its own cache, and this cache can be garbage collected when the computation is complete.  This becomes important if you use make-change in a long running process with very different denomination lists.  It also ensures that if you run make-change in multiple threads, the caches will stay isolated from one another.  Of course, if you are always using the same denomination list, the other way would be better because you'd want to keep the cache around and share it across threads.

Strategy 2:

No real surprises here, and only a few lines differ from Strategy 1, but for completeness, here is the final Clojure version of Strategy 2:
(defn count-coins [change]
  (if change
    (apply + (vals change))
    Double/POSITIVE_INFINITY))

(defn update [m k f default]
  (assoc m k (f (get m k default))))

(defn change-increment [change coin]
  (when change
    (update change coin inc 0)))

(defn ^:dynamic make-change [denominations amount]
  (cond
   (neg? amount) nil
   (zero? amount) {}
   (empty? denominations) nil
   :else
   (let [options (for [coin denominations]
                   (change-increment
                    (make-change denominations (- amount coin))
                    coin))]
     (apply min-key count-coins options))))

(defn make-change-fast [denominations amount]
  (binding [make-change (memoize make-change)]
    (last
     (for [i (range (inc amount))]
       (make-change denominations i)))))


Interestingly, in Clojure, Strategy 2 is twice as fast Strategy 1 for large target amounts (whereas in Racket, Strategy 1 was substantially faster).


Final Thoughts

On my computer, both Clojure versions were faster than all the Racket versions I came up with (both the ones I displayed above, and other ones I tried, for example, employing the priming-the-pump strategy on Racket even though it is not needed).  Specifically, my slowest Clojure version was twice as fast as the fastest Racket version.

I find Racket's #hash notation to be awkward to work with relative to Clojure's simple {} syntax for hash tables.

I wrote the Racket versions first, and they were error-free the first time.  I wrote the Clojure versions second, and they took me longer to write, even though I spend more time programming Clojure than Racket, in part because there were two errors I needed to debug.  First, I made a mistake when writing the update helper function (I initially tried to use Clojure's related update-in function, not realizing that it doesn't allow for a default value when the key is not found).  Second, I always forget that min-key is a variable argument function, rather than a list function like in Racket -- to me it seems far more intuitive to have that kind of function behave on a list, since that is the most common use case.  In both cases, Clojure's error messages were spectacularly unhelpful, which sadly, is par for the course in Clojure.

The other reason the Clojure code took longer to write is that I needed to figure out how I wanted to deal with the stack overflow, and that ate up some time.  I can't stress enough how freeing it is to not have to worry about those sorts of issues on Racket.

Clojure gives you better control over memoization strategies and even more options are available in clojure.core.memoize.  Racket doesn't even have a built-in memoization library, and the one on planet is weak relative to Clojure.

Usually, when comparing Clojure to Racket, I find that Clojure has the richer set of built-ins: more versatile data structures and functions that operate over them.  For this particular example, however, most of the built-in functions I relied on had counterparts in both languages, so the code looks nearly identical between the two languages.  Oddly enough, this time it was Clojure that was missing something I wanted, namely the hash-map update function.

Looking at this as a Clojure vs Racket battle, there's not a clear winner on this example since both allowed me to compactly express the approaches I had in mind and there were some advantages and disadvantages on both sides.  I'd probably give Clojure the nod because Clojure gave me better speed and more control over the memoization caching behavior.  (Yes, I realize that if I drop down to C, I'd get even better speed and control over caching strategies, but that's not my point.  My point is that if you can get better speed and control for a similar level of effort, why not?)  Nevertheless, with no stack overflows and better error messages, writing the code in Racket was a noticeably more pleasant experience.

For those who have not tried Code Katas, I encourage you to do so.  Code Katas usually are simple enough that they exercise fundamental skills that every programmer should possess, but are just tricky enough that there are several viable solutions and it is therefore interesting to compare those solutions with one another and across languages.

For those who enjoyed this particular Kata, I would recommend reading Doctor Ecco's Cyberpuzzles by Dennis E. Sasha.  The first chapter deals with a fascinating, related problem of trying to find the optimal denomination list of a given list that minimizes the average number of coins needed across a range of possible target amounts.

Thursday, November 15, 2012

Clojure makes quines way too easy

A popular programming challenge is to, in your favorite programming language, write a Quine -- a program/function that prints its own source code.  Usually, writing a Quine is an incredibly mindbending effort.  However, yesterday, it was pointed out to me that in Clojure, this task is ridiculously simple:

(defn self-source
  "prints the source of itself"
  []
  (source self-source))

Where's the challenge in that?  :-)

Sunday, June 17, 2012

I Feel Sorry For Computer Science Departments


There's a popular meme that it takes ten years of effortful study to become an expert at something. I'm sure that in reality, the number of years varies a bit from subject to subject and from one individual to the next, but one thing is clear – expertise takes time.

Therein lies the problem. Most students who enter college and decide to take computer science have minimal, if any, prior exposure to computer science. Colleges have 4 years to try to instill some meaningful level of expertise in students, but that's simply not enough time. Compounding the problem, many students are hoping to go out and get internships after their first year. This leads to a series of unfortunate, yet inevitable compromises.

CS departments are forced to choose: do we focus on foundational skills and the big picture of what computer science is all about, or do we focus on technical training to try to produce graduates who have skills with immediate appeal to companies? Talk to any CS professor and you'll hear plenty of stories of bitter, divisive debates about this very issue within departments and across the entire community of computer science educators.

The very best schools are constantly reassessing this question and retooling their program. Matt Might published a great wishlist of topics that every computer science student should know. I also have a special admiration for Carnegie Mellon in this regard. Despite being consistently ranked as one of the very top universities for computer science in the country, they refused to rest on their laurels and recently did a complete overhaul of their introductory classes.

My own two cents on the topic is that if a choice has to be made, it is better to err on the side of teaching foundational subjects. I admire curricula such as Program by Design, which takes a bold stand, teaching introductory classes in a non-mainstream programming language (Racket, a dialect of Scheme). They do this because the language is a particularly good choice for teaching design and program construction at a deep level. Knowledge of this language isn't likely to be immediately useful for a summer internship, but colleges who use this curriculum report that down the road, their students come out much stronger and much more sought after by companies.

But the sad truth is that no matter how many times CS departments debate this issue and retool, there are no good answers. Four years is simply not enough time to become an expert in computer science. Colleges are stuck between a rock and a hard place and it's a bad situation all around. Companies are distinctly unimpressed and disappointed with the vast majority of graduates that colleges are producing. Ideally, companies want to hire someone with the exact skills for a given job (one can argue that this is a bad hiring strategy for long-term growth, but often, it's what makes the most short-term economic sense). However, there aren't enough of those to go around, so companies try to make the best of the situation by just trying to find the smartest students they can, figuring the smart ones can hopefully compensate for their lack of experience by picking things up quickly on the job. More often than not, the knowledge gained from a CS education is viewed by companies as being so insufficient as to be almost irrelevant – nevertheless, graduating from a well-known school can be seen as a kind of proxy for the kind of drive and innate smarts they really are looking for.

Flipping this around and looking at it from the perspective of students, many graduating students are finding out the hard way that they lack the real-world skills companies are seeking. If they can get that first job, sometimes they discover that they lack the background necessary to keep up with the tectonic shifts in the industry; when that first job goes away, it can be very tricky to make the transition to something new. If you don't have the exact skills companies are looking for, and you're no longer in that bucket of entry-level, fresh-out-of-school applicants that companies might be willing to take a chance on, hunting for that second job can be especially tough.

How do other departments solve this problem? Well, many domains are able to leverage the significant number of years that students have already invested in grade school in English, math, and science. For example, most students who go into mechanical engineering have already had the opportunity to learn math up through calculus and have learned physics as well. Imagine how many years it would take to become a mechanical engineer with absolutely no prior math or science instruction, and you'll begin to appreciate the problem that CS departments face. Also, many other disciplines require significant post-graduate study and apprenticeships in a way that computer science does not. Arguably, computer science has one of the greatest disparities between the demand for expertise, and the level of expertise that is actually attained before one goes into the business.

But wait a second... computer science is an engineering discipline. Shouldn't computer science benefit from kids' math and science education as much as any other science/tech subject? Unfortunately, no. Calculus, the pinnacle of grade school math education as it is currently structured, is the least relevant type of math for computer scientists. Computer scientists need a strong background in Discrete Math and these topics are poorly covered in grade school, if at all.

To further illustrate the point, there is not a single programming class offered in the elementary and middle schools near my home. At the closest high school, most of the tech ed classes are about how to use Microsoft Office and Powerpoint to write reports; programming offerings are fairly lightweight. Keep in mind that I live in a part of the country that is fairly rich with tech companies, less than 30 miles from Microsoft, Amazon, Facebook, Google, Nintendo, and Boeing. Despite my complaints about the meager offerings in my school district, there's no doubt in my mind that most places probably have it much worse in terms of providing kids with early exposure to programming.

So for the most part, computer science curricula start from scratch. However, Program by Design, the intro CS curriculum I mentioned earlier, stands out from the pack. Unlike most other approaches to teaching CS, they intentionally try to leverage students' existing math knowledge by portraying programming as a kind of executable algebra. This is a clever strategy for trying to maximize how far students can get towards expertise in just four years of college.

Once the problem has been laid bare like this – four years provides insufficient preparation for a career in computer science – it is obvious that there are only a couple long-term solutions. One possibility is to extend the duration of CS education, another possibility is to incorporate more CS topics and exposure to programming into the grade school curriculum.

I think a strong case can be made that our society would benefit from more CS in grade school, so that's the direction I would be inclined to go. Programming is rapidly becoming a foundational skill that has value across a wide range of disciplines. An understanding of data, functions, and algorithms can play an important role, right alongside mathematics and the scientific method, for developing problem-solving skills that are essential in our modern world.

Another helpful change would be to incorporate more discrete math into the grade school curriculum. A stronger background in discrete math would make it much more feasible for computer science students to make rapid progress in just four years of college. Why should we make a change in the math curricula that just benefits future computer science students? Well, the short answer is that it wouldn't just benefit future computer science students. Arthur Benjamin makes the case in his TED talk that discrete math (logic, statistics, etc.) is far more relevant to most walks of life than, say, calculus.

So in the abstract sense, it's relatively clear what needs to change in order to solve the problem. But we all know that implementing such a solution may well be intractable. Even if grade schools were motivated to incorporate more discrete math and programming into their classes, how do you go about finding and recruiting qualified teachers?

Therefore, this is likely to remain a problem for a long time to come. In the meantime, I feel sorry for college CS departments, I feel sorry for CS students, I feel sorry for the people who would love programming but never get exposed to it, and I feel a sense of loss that there's so much more our society could achieve if we could narrow the gap between supply and demand in computer science expertise. My hat's off to everyone who works hard at making the best out of this bad situation.

Sunday, November 20, 2011

Review of 2011 free Stanford online classes

Over the summer, Stanford announced that they would be offering their AI class online for free. It made headlines, and a few weeks later they announced that they would be offering their intro to databases class and their machine learning class as well.

I've been working through the material for all three classes so that I would know whether they were worth recommending to the high school students I work with, and also to satisfy my own personal curiosity about how the online classes would be conducted. Summary: The database and machine learning classes are excellent. Ironically, the AI class is pretty bad, even though it was the poster child for this wave of online offerings.

The database class is the most accessible. I believe it is a freshman class at Stanford, and I think most CS-oriented high schoolers would do just fine with it. As with many CS classes, it certainly helps to have a strong background in discrete math. Specifically, prior exposure to mathematical logic, set theory, and relations makes it significantly easier to follow the discussions of relational algebra and relational design theory. The videos are fast-paced and interesting. The randomized quizzes that you can take over and over until you get 100% are a brilliant way to empower students to keep working until they have achieved mastery. The online homework system for practicing queries against a live database works quite well, and the exercises cover a nice range of difficulty from easy to hard. The teacher's weekly "screenside chats" and vibrant forum community really make it feel like you're "taking a class" rather than just working through a sterile set of videos and exercises. The material is well organized and is generally posted two to three weeks ahead of time for those who want to get ahead. All in all, it's the best example I've ever seen of what online education can potentially be.

The machine learning class is of similarly high quality. It shares the same video technology and the same quiz engine. The machine learning class also features weekly programming assignments, using the free language Octave. The programming write-ups are very clear, and you can keep submitting your program until you get it perfect. The submission process is very easy. Unfortunately, I won't be able to recommend this class to many high school students. This class is a very math-centric approach to machine learning, and I think to fully appreciate the material you need to have a certain comfort level with the basics of linear algebra, and it helps to have seen multivariate calculus. I doubt many high school students have that mathematical background.

Interestingly, just a few months ago, I watched some videos on "iTunes University" of the Stanford machine learning class, taught by the same professor (Andrew Ng). It is instructive to contrast my experience watching those classroom videos with my experience in the online class. The classroom videos tended to be quite long and slow, watching the professor scrawl long mathematical derivations on multiple blackboards. Without being able to see and do the related homeworks and programming assignments, it became difficult to follow the material. In contrast, the online course videos are much more briskly paced (because they know you can pause or rewatch the video if you don't get something), and the assignments do a great job of solidifying the knowledge before moving on to the next topic. It's amazing how much better the overall experience is with the online class than just watching the videos of the classroom lectures.

As I said up top, the AI class is astonishingly bad compared to the other two. This is all the more surprising given that it is the one that gained the most widespread attention when it was announced. The website is much more poorly organized than the sites for the other two classes. The videos are poor quality - I mean this in both the literal sense (the video image is of a dimly lit piece of paper and the audio is muffled) and the content sense (the pace is much slower, failing to take advantage of the medium's ability to be paused or rewound). The questions interspersed in the video don't seem to be particularly well chosen to solidify knowledge; instead the questions are often just prompts to motivate the next topic -- you're not really expected to know the answer to the question when it is asked. This means that the only means to really solidify the knowledge is the homework quiz. These quizzes are poorly presented (rather than a clearly expressed, written statement, you have to listen to the instructor verbally explain the question) and there is no immediate feedback. Unlike the other two classes, the quiz is not randomized, so there is one set of questions and then you must wait a week to compare your answers against the correct answers (and the mechanism for checking your answers is somewhat clunky). The whole thing seems like the profs weren't ready to go prime time with this class. Three weeks into the class, the classroom forum section was still "coming soon", for example. In fact, when I last looked, they had completely punted on the forum section, and the page just said to "use Reddit" instead. Also, the videos tend to be posted quite late. Honestly, if I had nothing to compare it to, I might think it was okay, but relative to the other two classes, it is mediocre at best.

If I were to judge the AI class solely in terms of its content, rather than on its presentation, my review wouldn't be any better. The class is really a breadth survey of various topics in AI, with no programming to back it up. Unless you're going to dig in and actually program some of these things, I really don't see the point. I believe the actual Stanford version of the class offered a programming component, but that it was dropped from the online class for logistical reasons. This is understandable, but it really takes away from the value of the class. One reason the machine learning class is so much better is because they did find a way to incorporate programming assignments.

Tuesday, August 10, 2010

Racket vs. Clojure

I've been asked by several people to explain why I use Clojure for my professional work rather than Racket.

ABOUT RACKET


I have been using Racket (a dialect of Scheme) for several years to teach kids how to program. Although Racket is a great first language, it's definitely not a "toy language". In fact, Racket offers a number of interesting features not found in other languages, making it an attractive option for real-world work. Racket puts into practice state-of-the-art research on macros, continuations, contracts, and interoperation between static and dynamically typed code. The integrated Scribble system makes it easy to provide high-quality documentation and/or write literate programs. It comes with a pleasant, lightweight IDE complete with an integrated debugger and profiler (as well as innovative features such as a specialized macro debugger).

I'm a fan of functional programming and dynamic typing. I know how to write and think in Racket from my many years teaching it, so with all these features, it should be a slam dunk for me to use it professionally, right?

Well, no....

IT'S ALL ABOUT THE DATA STRUCTURES


I have discovered that for me, the #1 factor that determines my programming productivity is the set of data structures that are built-in to the language and are easy to work with. For many years, Python set the standard for me, offering easy syntax to manipulate extensible arrays (called lists in Python), hash tables (called dictionaries in Python), tuples (an immutable collection that can serve as keys in a hash table), and in recent versions of Python, sets (mutable and immutable), heaps, and queues.

Racket, as a dialect of Scheme, places the greatest importance on singly-linked lists. OK, that's a reasonable starting point -- you can do a lot with linked lists. It also offers a vector, which is an old-fashioned non-extensible array that is fixed in length. (Who wants fixed-length arrays as a primary data structure any more? Even C++ STL offers an extensible vector...)

Vectors are mutable, which is both a plus and a minus. On the plus side, it allows you to efficiently write certain classes of algorithms that are hard to write with linked lists. It serves a purpose that is different from linked lists, so there is value to having both in the language. The huge minus is that Racket simply isn't oriented towards working conveniently with mutable vectors. Working with mutable data structures conveniently demands certain kinds of control structures, and certain kinds of syntaxes. You can write vector-based algorithms in Racket, but they look verbose and ugly. Which would you rather read:
a[i]+=3 or (vector-set! a i (+ (vector-ref a i) 3)) ?
But if you can get past the more verbose syntax, there's still the fundamental issue that all the patterns change when you move from using a list to a vector. The way of working with them is so fundamentally different that there is no easy way to change code from using one to another.

Racket goes further than most Scheme implementations in providing built-in data structures. It also offers, for example, hash tables (and recently sets were added). But the interface for interacting with hash tables is a total mess. The literals for expressing hash tables use dotted pairs. If you want to construct hash tables using the for/hash syntax, you need to use "values". If you want to iterate through all the key/value pairs of a hash table, it would be nice if there were an easy way to recursively process the sequence of key/value pairs the way you would process a list. Unfortunately, Racket provides no built-in lazy list/stream, so you'd need to realize the entire list. But even if that's what you'd want to do, Racket doesn't provide a built-in function to give you back the list of keys, values or pairs in a hash table. Instead, you're encouraged to iterate through the pairs using an idiosyncratic version of its for construct, using a specific deconstructing pattern match style to capture the sequence of key/value pairs that is used nowhere else in Racket. (Speaking of for loops, why on earth did they decide to make the parallel for loop the common behavior, and require a longer name (for*) for the more useful nested loop version?) Put simply, using hash tables in Racket is frequently awkward and filled with idiosyncracies that are hard to remember.

There are downloadable libraries that offer an assortment of other data structures, but since these libraries are made by a variety of individuals, and ported from a variety of other Scheme implementations, the interfaces for interacting with those data structures are even more inconsistent than the built-ins, which are already far from ideal.

I'm sure many programmers can live with the awkwardness of the built-in data structures to get the other cool features that Racket offers, but for me, it's a deal breaker.

ENTER CLOJURE


Clojure gets data structures right. There's a good assortment of collection types built in: lists, lazy lists, vectors, hash tables, sets, sorted hash tables, sorted sets, and queues. ALL the built-in data structures are persistent/immutable. That's right, even the *vectors* are persistent. For my work, persistent vectors are a huge asset, and now that I've experienced them in Clojure, I'm frustrated with any language that doesn't offer a similar data structure (and very few do). The consistency of working only with persistent structures is a big deal -- it means you use the exact same patterns and idioms to work with all the structures. Vectors are just as easy to work with as lists. Equality is simplified. Everything can be used as a key in a hash table.

Data structures in Clojure get a little bit of syntactic support. Not a tremendous amount, but every little bit helps. Code is a little easier to read when [1 2 3] stands out as a vector, or {:a 1, :b 2, :c 3} stands out as a hash table. Lookups are a bit more terse than in Racket -- (v 0) instead of (vector-ref v 0). Hash tables are sufficiently lightweight in Clojure that you can use them where you'd use Racket's structs defined with define-struct, and then use one consistent lookup syntax rather than type-specific accessors (e.g., (:age person) rather than (person-age person)). This gets to be more important as you deal with structures within structures, which can quickly get unwieldy in Racket, but is easy enough in Clojure using -> or get-in. Also, by representing structured data in Clojure as a hash table, you can easily create non-destructive updates of your "objects" with certain fields changed. Again, this works just as well with nested data. (Racket structs may offer immutable updates in future versions, but none of the proposals I've seen address the issue of updating nested structured data.) Furthermore, Clojure's associative update function (assoc) can handle multiple updates in one function call -- contrast (assoc h :a 1 :b 2) with (hash-set (hash-set h 'a 1) 'b 2).

Even better, the process for iterating through any of these collections is consistent. All of Clojure's collections can be treated as if they were a list, and you can write algorithms to traverse them using the same pattern of empty?/first/rest that you'd use on a list. This means that all the powerful higher-order functions like map/filter/reduce work just as well on a vector as a list. You can also create a new collection type, and hook into the built-in sequence interface, and all the built-in sequencing functions will automatically work just as well for your collection.

Although the sequencing functions work on any collection, they generally produce lazy lists, which means you can use good old recursion to solve many of the same problems you'd tackle with for/break or while/break in other languages. For example, (first (filter even? coll)) will give you the first even number in your collection (whether a list, vector, set, etc.) and it will do so in a space-efficient manner -- it doesn't need to generate an intermediate list of *all* the even numbers in your collection. Some garbage is generated along the way, but it can be garbage collected immediately and with relatively little overhead. Clojure also makes it easy to "pour" these lazy sequences into the collection of your choice via into. Racket's lack of a built-in lazy list makes it difficult to use map/filter/etc. for general processing of collections. If you use map/filter/etc., you potentially generate a lot of intermediate lists. You can use a stream library, but it was probably designed for other Scheme dialects with a naming scheme for the API that doesn't match Racket's built-in list functions or integrate well with Racket's other sequencing constructs. So often you end up writing the function you need from scratch (e.g., find-first-even-number) rather than composing existing building blocks. In some special cases, you can use one of the new for constructs, like in this case, for/first.

A polymorphic approach is applied through most of Clojure's design. assoc works on vectors, hash tables, sorted hash tables, and any other "associative" collection. And again, you can hook into this with custom collections. This is far easier to remember (and more concise to write) than the proliferation of vector-set, hash-set, etc. you'd find in Racket. It also makes the various collections more interchangeable in Clojure, making it easier to test different alternatives for performance implications with fewer, more localized changes to one's code.

Summary:

  • Clojure provides a full complement of (immutable!) data structures you need for everyday programming and a bit of syntactic support for making those manipulations more concise and pleasant.
  • All of the collections are manipulated by a small number of polymorphic functions that are easy to remember and use.
  • Traversals over all collections are uniformly accomplished by a sequence abstraction that works like a lazy list, which means that Clojure's higher order sequence functions also apply to all collections.


CLOJURE'S NOT PERFECT


The IDEs available for Clojure all have significant drawbacks. You can get work done in them, but any of the IDEs will probably be a disappointment relative to what you're used to from other languages (including Racket).

Debugging is difficult -- every error generates a ridiculously long stack trace that lists 500 Java functions along with (maybe, if you're lucky) the actual Clojure function where things went awry. Many of Clojure's core functions are written with a philosophy that they make no guarantees what they do with bad input. They might error, or they might just return some spurious answer that causes something to blow up far far away from the true origin of the problem.

Clojure inherits numerous limitations and idiosyncracies from Java. No tail-call optimization, no continuations. Methods are not true closures, and can't be passed directly to higher-order functions. Proliferation of nil and null pointer exceptions. Slow numeric performance. Compromises with the way hashing and equality works for certain things to achieve Java compatibility. Slow startup time.

Some people love Clojure specifically because it sits on top of Java and gives them access to their favorite Java libraries. Frankly, I have yet to find a Java library I'd actually want to use. Something about Java seems to turn every library into an insanely complex explosion of classes, and Java programmers mistakenly seem to think that JavaDoc-produced lists of every single class and method constitutes "good documentation". So for me, the Java interop is more of a nuisance than a help.

Clojure has a number of cool new ideas, but many of them are unproven, and only time will tell whether they are truly valuable. Some people get excited about these features, but I feel fairly neutral about them until they are more road-tested. For example:

  • Clojure's STM implementation - seems promising, but some reports suggest that under certain contention scenarios, longer transactions never complete because they keep getting preempted by shorter transactions.
  • agents - if the agent can't keep up with the requests demanded of it, the agent's "mailbox" will eventually exhaust all resources. Perhaps this approach is too brittle for real-world development?
  • vars - provides thread isolation, but interacts poorly with the whole lazy sequence paradigm that Clojure is built around.
  • multimethods - Clojure provides a multimethod system that is far simpler than, say CLOS, but it requires you to explicitly choose preferences when there are inheritance conflicts, and early reports suggest that this limits extensibility.
  • protocols - This is an interesting variation on "interfaces", but it's not clear how easy it will be to compose implementations out of partial, default implementations.
  • transients - Nice idea for speeding up single-threaded use of persistent data structures. Transients don't respond to all the same interfaces as their persistent counterparts, though, limiting their usefulness. Transients are already being rethought and are likely to be reworked into something new.


So it's hard for me to get excited about these aspects of Clojure when it remains to be seen how well these features will hold up under real-world use.

I'm sure that for many programmers, Clojure's drawbacks or unproven ideas would be a deal breaker. We all care about different things. But for me, Clojure's clean coherent design of the API for working with the built-in data structures is so good, that overall, I prefer working in Clojure to working in Racket.

Monday, July 26, 2010

Translating Code from Python and Scheme to Clojure

When coming to Clojure from another language, it takes a while before you start "thinking in Clojure". While ramping up, it helps to understand how to solve a problem in a language you're already familiar with, and then translate the code in some methodical fashion into Clojure.

This article will look at a simple function, remove-first, and look at how you would implement that function in Python and Scheme, and then how to methodically transform those implementations into Clojure. In all of these implementations, I'm going to ignore ways to write the function using shortcuts provided by the standard library, and focus on the implementations using standard iteration and/or recursive techniques. This will provide the clearest example of how the translation process works and can generalize to other types of functions.

Problem Statement: remove-first takes an item and a collection, and returns a new collection which is identical to the original, except the first instance of item (if any) has been removed from the collection. If item is not in the collection, the new collection should be identical to the original (since nothing needs to be removed).

First, let's look at the Python implementation. Python's primary collection data structure is called a "list", but this is a bit of a misnomer, because in most languages, the term "list" is used to describe some sort of linked or doubly-linked list. Python's list is nothing of the sort. Python's list allows fast (destructive) insertion and removal at the back end of the list, and fast lookup by index. In most languages, this would be called an extensible array or extensible vector (in Java, it's called an ArrayList).

The problem statement for remove-first calls for returning a new copy of the collection with the first instance of item removed. Is this really idiomatic for Python? It is certainly possible to write remove-first as a destructive function that actually modifies the original collection by removing the first instance of item. In fact, such a destructive method is built-in to the list class (list.remove(item)). But removing from the middle of a Python list is not an especially efficient operation, and Python has a culture of slices, comprehensions, and many other list operations that return fresh copies. So yes, I think it is reasonable to talk about how to write a non-destructive removal in Python.

Now remember, for the purposes of this article, we're trying to look at how to translate iterative algorithms, so it's a cheat to use built-in constructs that work around this.

So this doesn't count, because it uses the built-in destructive removal:
def removeFirst(itemToRemove, coll):
newColl = coll[:] # copies the collection
try:
newColl.remove(itemToRemove) # throws an error if item is not present
return newColl
except:
return newColl


Nor does this, although this is arguably the most idiomatic Python version of removeFirst, because it uses the built in index function which handles the iteration behind the scenes:
def removeFirst(itemToRemove, coll):
try:
i = coll.index(itemToRemove)
return coll[:i] + coll[i+1:]
except:
return coll[:]


In Python, the standard practice for writing such a function in an iterative fashion is to create a new list, and then iterate over the items of the initial list, adding the appropriate ones to the new list, and then returning the new list at the end. This pattern of setting up an accumulator, and then using a for loop to add things to the accumulator, is a common one in Python. For this particular problem, in English you might say, "I'm going to go through the items, adding them one at a time to the new collection. If I hit one that matches the item to remove, I skip it and add the rest of the items directly to the new collection."

But is it better to iterate directly over the items, or is it better to iterate over the indices and access the items through the indices? When possible, it's preferred to iterate directly over the items, but unfortunately, Python has no good way to express "the rest of the items" once you hit an item that matches the one to remove. You can only get at "the rest of the items" if you know the index in order to take a slice from there to the end. So iterating through items versus iterating through indices yield slightly different strategies for this particular problem.

If you really wanted to iterate directly over the items, you'd probably need to use a flag to track whether you've already removed the first occurrence of the item:

def removeFirst(itemToRemove, coll):
newCollection = []
alreadyRemovedItem = False
for item in coll:
if (alreadyRemovedItem):
# We've already removed the first instance of item
# so we're just in "copy" mode
newCollection.append(item)
else:
# We need to test whether the item matches itemToRemove
if (item == itemToRemove):
# don't copy this item over, but set alreadyRemovedItem flag
alreadyRemovedItem = True
else:
newCollection.append(item)
return newCollection


which could be refactored by reorganizing the if/else branches into the shorter:

def removeFirst(itemToRemove, coll):
newCollection = []
alreadyRemovedItem = False
for item in coll:
if (alreadyRemovedItem or item != itemToRemove):
newCollection.append(item)
else:
alreadyRemovedItem = True
return newCollection


Alternatively, if you iterate through the indices, then you can use a list slice to capture the notion of "the rest of the list".

def removeFirst(itemToRemove, coll):
newCollection = []
for index in range(len(coll)):
item = coll[index]
if (item == itemToRemove):
# skip this item and rather than add the rest of the items
# one by one, we can just add the rest to newCollection
# all in one step
newCollection.extend(coll[index+1:])
# We're done now
return newCollection
else:
newCollection.append(item)
return newCollection


Admittedly, this last version is a bit of a cheat by my own definition of avoiding built-ins that bypass iteration, since that's effectively what extend is doing. But I don't mind it as much since we at least had to iterate until we found the item to remove, so it sufficiently demonstrates iteration techniques.

I think either of these versions (iterating through items or iterating through indices) are good examples of common patterns that occur in Python, and are worth knowing how to translate to Clojure.

Let's begin by translating the version that iterates through items.

Clojure has a built-in datastructure that corresponds very closely to Python's lists. In Clojure, it is called a vector. Like Python lists, it allows fast access to an element by index, and allows fast insertion and removal at the back end of the collection. The main difference is that a Clojure vector, like all built-in Clojure data structures, is persistent and immutable. That means that any operation on a Clojure vector returns some sort of fresh copy -- the original remains unchanged. At first, this might sound wildly inefficient, but altered Clojure vectors can share a lot of internal structure with their source, precisely because of this guarantee of immutability, so it's actually pretty fast.

But this requires a different way of thinking about algorithms. We can't just create a new empty vector and destructively add things to it and return it, as we do with Python. (Well, of course, since Clojure sits on top of Java, you can easily use Clojure to create a Java ArrayList and use exactly the same pattern as Python, but you're here to learn "the Clojure way", right?)

Perhaps the most naive way to translate the code is to simulate a mutable vector in Clojure by wrapping a vector in some sort of mutable reference type (e.g., an atom). You'd need to do the same thing to the already-removed-flag. The @ sign is then used to look at the current contents of the mutable reference. This is very bad form for this kind of algorithm, and not particularly efficient, but it works, and has an almost exact parallel to the Python code:

; This is bad style, don't do this!!!
(defn remove-first [item-to-remove coll]
(let [new-collection (atom []),
already-removed-item (atom false)]
; doseq is just like Python's for loop
(doseq [item coll]
(if (or @already-removed-item (not= item item-to-remove))
(swap! new-collection conj item) ; just like append
(reset! already-removed-item true))) ; sets flag to true
@new-collection))



The better way to tackle this translation is to closely analyze the original, and identify any accumulator or variable that changes as you loop, and then thread those through the loop. I'll explain shortly what I mean by threading values through the loop, but there's one other issue that needs to be considered. You've already seen Clojure's doseq construct in action as a counterpart to Python's for loop, but it's only relevant for triggering destructive or side-effect-filled actions -- it's not the right tool for the job when trying to build up a persistent Clojure vector. Clojure has a for construct but it's not a general looping construct, rather, it corresponds to Python's list and generator comprehensions.

Clojure really only has one general-purpose looping construct, known as loop/recur. In time, you'll be able to see how to go directly from a Python-style for loop to a Clojure loop/recur, but initially, it's far easier to see how to go from a Python-style while loop to a Clojure loop/recur. So as an intermediary step in translating our Python code to Clojure, let's begin by rewriting the Python for loop into a while loop. Here is one way to do that rewrite:

def removeFirst(itemToRemove, coll):
newCollection = []
alreadyRemovedItem = False
collIterator = iter(coll)
while(True):
try:
item = collIterator.next()
if (alreadyRemovedItem or item != itemToRemove):
newCollection.append(item)
else:
alreadyRemovedItem = True
except:
# next() triggers an error when the end of the list is reached
# so we're done
return newCollection


The advantage of rewriting as a while loop is that it helps us identify a couple of very important things. First, it let's us see the exit condition. The exit condition occurs when you've reached the end of the list, at which point newCollection holds the answer and can be returned. Second, it makes it a tad easier to analyze which things from outside the loop are being updated while inside the loop. The assignment to item is just creating a local variable within a given iteration of the loop, so that's not really the kind of thing we're looking for. But we should take note that newCollection, initialized outside the loop, is destructively extended within the loop, and the flag alreadyRemovedItem can also change from within the loop. Furthermore, it's now quite obvious that something needs to track the iteration through coll; this is done by collIterator which is updated each time through the while loop by the call to its next() method. So collIterator, newCollection and alreadyRemovedItem are the things we're going to need to thread through our Clojure loop/recur structure.

Once these things have been identified, we're ready to tackle the Clojure translation. Iterators in Clojure work quite differently than in Python. Almost any collection in Clojure can be converted to a "seq" (short for sequence) by calling a function called, you guessed it, seq. For the moment, go ahead and think of it as an iterator. The iterator will be nil when the collection is exhausted. You call first to get the item the iterator is pointing at, and next to (non-destructively) advance the iterator.

(defn remove-first [item-to-remove coll]
; Start a loop, identifying and initializing the things that will change in the loop
(loop [coll-iterator (seq coll),
new-collection [],
already-removed-item false]
; coll-iterator is nil when you're at the end of the collection.
; Clojure treats all non-nil, non-false values as true.
(if coll-iterator
; We haven't reached the end of the collection
(let [item (first coll-iterator)]
(if (or already-removed-item (not= item item-to-remove))
; update new-collection and advance coll-iterator. We do this using recur
; to jump back to loop, rebinding coll-iterator to (next coll-iterator),
; new-collection to (conj new-collection item), and
; leaving already-removed-item unchanged
(recur (next coll-iterator) (conj new-collection item) already-removed-item)

; else, advance iterator, leave new-collection unchanged,
; and set already-removed-item to true
(recur (next coll-iterator) new-collection true)))
; We have reached the end of collection, so new-collection is the answer
new-collection)))


Note that nothing destructive is happening here; it's all mutation-free. (next coll-iterator) actually returns a new iterator object, (conj new-collection item) actually creates a new vector with item appended to the end. Clojure makes these operations cheap, and recur lets us pass the new objects back to the top of the loop and reuse the names given in the loop construct.

Now I'll let you in on a little secret. Rather than thinking of (seq coll) as returning an iterator, you can think of it as returning a linked-list-style view of the collection. Better yet, virtually all sequential-style functions in Clojure call seq implicitly, so that means, for all practical purposes, you can pretend that any Clojure collection is a linked list. Lists are able to answer three important questions: are you empty, what is your first element, and what are the rest of your elements? These correspond to empty?, first, and rest in Clojure. So just by thinking of our collection as a list, we have an extremely powerful way to iterate through it using recursion. We can rewrite the above Clojure code with this in mind:

(defn remove-first [item-to-remove coll]
(loop [coll coll, ; no explicit call to seq is needed,
; we can reuse the name coll for clarity
new-collection [],
already-removed-item false]
(if (empty? coll)
new-collection
(let [item (first coll)]
(if (or already-removed-item (not= item item-to-remove))
(recur (rest coll) (conj new-collection item) already-removed-item)
(recur (rest coll) new-collection true))))))


This concludes our mechanical translation of the iterate-through-items Python version. It may look odd to you if you've never seen this kind of looping before. Some programmers actively prefer this style of looping because it makes it abundantly clear which things are changing each time through the loop, and how, and precisely what the exit condition is and what value you exit with. In other words, it's arguably more explicit and easier to analyze a loop/recur structure than a for loop that's mucking around with mutable objects located outside the loop. There's truth to this, but I also sympathize with those who find loop/recur to be less readable. For loops nest well and allow for some pretty intricate control flow with judicious use of continue and break; complex nested for loops like that can be hard to analyze, but the equivalent loop/recur can be even worse. Nevertheless, loop/recur is the way general looping is done in Clojure, so for now, we'll just accept it along with its advantages and disadvantages and move on.

Now it's time to look at the iterate-through-indices version. Again, we begin by converting the Python for loop to a Python while loop.

def removeFirst(itemToRemove, coll):
newCollection = []
index = 0
while (True):
if (index == len(coll)):
# We're done now
return newCollection
else:
item = coll[index]
if (item == itemToRemove):
# Add the rest of the elements all at once and we're done.
newCollection.extend(coll[index+1:])
return newCollection
else:
newCollection.append(item)
index += 1


This time, we see the things that change while looping are newCollection and index. The above code can now be mechanically translated to:

(defn remove-first [item-to-remove coll]
(loop [new-collection [],
index 0]
(if (= index (count coll))
new-collection
(let [item (coll index)]
(if (= item item-to-remove)
; into is like Python's extend, subvec is like Python's slice
(into new-collection (subvec coll (inc index)))
(recur (conj new-collection item) (inc index)))))))


Notice how in Clojure code, we don't need to explicitly call return, return is implied.

In Python we had two basic strategies, iterate by items and iterate by indices, and both had an analog in Clojure. But remember, the reason why we needed two strategies in Python was that there was no way to capture the "add the rest of the items to the collection" concept in the version that iterated through items, so we were forced to choose between iterating through items and use a flag to go into "copy items until reaching the end of the list"-mode, or use indices so we could take a slice.

But Clojure offers us a way to fuse these two strategies together, because its basic iteration mechanism (the linked-list-style view) DOES make it extremely easy to work with the notion of "the rest of the items" without any index manipulation or slicing.

Fusing the two Python strategies in Clojure, we get this:

(defn remove-first [item-to-remove coll]
(loop [coll coll,
new-collection []]
(if (empty? coll)
new-collection
(let [item (first coll)]
(if (= item item-to-remove)
; Extend new-collection with rest of coll and return in one step
(into new-collection (rest coll))
(recur (rest coll) (conj new-collection item)))))))


There's one downside to this implementation, namely, we're paying a performance penalty for gradually building up this vector by extending an immutable object one item at a time, when we don't care about and will never use the intervening steps between the empty vector and the final new collection. It's not a huge penalty, but it's real. Fortunately, Clojure offers a "recipe" to convert such functions. Upon entering the loop, you initialize the thing you're building to a transient (somewhat more mutable) vector. Then you append to it using conj! rather than conj, and finally you convert it back to immutable at the end using persistent!. The result looks like this:

(defn remove-first [item-to-remove coll]
(loop [coll coll,
new-collection (transient [])]
(if (empty? coll)
(persistent! new-collection)
(let [item (first coll)]
(if (= item item-to-remove)
(into (persistent! new-collection) (rest coll))
(recur (rest coll) (conj! new-collection item)))))))


Note that the overall shape of the code has not changed, we've just added a few annotations that improve performance. But remember this step is optional, the previous version is perfectly fine for most purposes.

Although we were mainly trying to mimic the Python code, it's worth noting that Clojure code is highly polymorphic. Whereas the Python code only takes Python lists (and maybe some of the above versions would take strings), this Clojure algorithm will work on any collection (arrays, lists, lazy lists, vectors, sets, maps, strings, etc.) because all have that wonderful property of being viewable as lists. However, the returned collection is specifically a vector, no matter the input, so depending on the context, the polymorphism of the input may have limited utility.

Now it's time to move on to Scheme. We'll look at a standard Scheme implementation of remove-first, and see how to translate that into Clojure. These examples have been tested in the Racket dialect of Scheme.

In Scheme, the most basic, native collection type is the linked list. The three fundamental operations on a Scheme list are empty?, first, and rest (sound familiar?) and you can also non-destructively add an item to the front of the list with (cons item list). Adding to the back of a list is a slow operation, so the strategy of building up a new collection by adding to the back is not a particularly desirable one in Scheme. Instead, the strategy is to use recursion, essentially allowing the call stack to build the sequence of items that need to be added, eventually, to the front of an empty list. It sounds confusing, but once you have your head wrapped around recursion, it all makes perfect sense (and if you don't understand recursion, head directly to htdp.org).

In any case, a typical Scheme implementation looks like this:
(define (remove-first item-to-remove coll)
(if (empty? coll)
empty
(let ([item (first coll)])
(if (equal? item item-to-remove)
(rest coll)
(cons (first coll) (remove-first item-to-remove (rest coll)))))))


Clojure also has a list collection, and the translation is about as straightforward as it could possibly be, requiring only a couple syntactic changes:

(defn remove-first [item-to-remove coll]
(if (empty? coll)
() ; literal name for empty list
(let [item (first coll)]
(if (= item item-to-remove)
(rest coll)
(cons (first coll) (remove-first item-to-remove (rest coll)))))))



But there's a catch. Because Scheme relies on this kind of programming style, Scheme implementations are designed in such a way so that the stack has rather huge limits, basically being limited by your overall memory rather than some specific call-stack memory limitation. In other words, if the resulting list is small enough to fit in your computer's memory, than in all likelihood the call stack necessary to process it with recursion will fit as well. So call stack limitations are mostly a non-issue in Scheme.

However, Clojure is limited to Java stack limits, so this style of writing will definitely place a limit on the size of collection that can be processed by this function. Fortunately, there is a rather simple solution. Clojure offers seamless interoperation between lazy lists and regular lists. Lazy lists solve the stack problem by avoiding the recursive step, returning immediately with a list-like object that can be probed on-demand for first and rest information by the consumer. Further elements will be computed as needed, and will be driven by the consumer's looping process.

This is accomplished by wrapping a call to lazy-seq around some part of the computation. There are at least three reasonable places to place the call to lazy-seq. You can put lazy-seq around the full body of the function.
(defn remove-first [item-to-remove coll]
(lazy-seq (if (empty? coll)
()
(let [item (first coll)]
(if (= item item-to-remove)
(rest coll)
(cons (first coll) (remove-first item-to-remove (rest coll))))))))



You can place it around the cons.
(defn remove-first [item-to-remove coll]
(if (empty? coll)
()
(let [item (first coll)]
(if (= item item-to-remove)
(rest coll)
(lazy-seq (cons (first coll) (remove-first item-to-remove (rest coll))))))))


You can place it around the recursive call to remove-first.
(defn remove-first [item-to-remove coll]
(if (empty? coll)
() ; literal name for empty list
(let [item (first coll)]
(if (= item item-to-remove)
(rest coll)
(cons (first coll) (lazy-seq (remove-first item-to-remove (rest coll))))))))


Each choice results in slightly different laziness behavior, i.e., when various elements are computed, but the overall semantics of the sequence remains the same and stack overflows will be avoided. Placing the lazy-seq around the recursive function call will cause remove-first to compute the first element right away, and delay the rest. Placing the lazy-seq around the full body will prevent any computation until it is asked for by a consumer. Placing the lazy-seq around the cons results in in immediate behavior for the nil and removable-item-at-front-of-list case, and delayed behavior otherwise.

All are acceptable choices, but preferences vary. Probably placing lazy-seq around the full body is the most common style you'll see in Clojure, although I tend to place it where the laziness is actually required (like around the recursive call, or around the cons).

Converting remove-first so that it returns lazy lists definitely generates some additional overhead than a strict list. However, this overhead pays for itself if you ever end up using just part of the list, because no time is spent generating the parts you don't need. There's something very refreshing, freeing, and efficient about writing functions that find the first object with some particular property by using the strategy of taking the first item from the list of ALL objects with that particular property, knowing full well that the complete list will never be generated. Lazy lists can be used as an alternative to many traditional control structures (the above example of taking the first item from a lazy-list of objects satisfying a given description is an elegant substitute for something that would require a for-loop-break iteration in a traditional language). Generally speaking, lazy lists are more useful than strict lists, and for that reason, lazy lists are the norm in Clojure rather than the exception.

Since Clojure offers both vectors (similar to Python's lists) and lists (similar to Scheme's lists), we have seen that it is possible to convert both algorithmic styles into Clojure. With Python, the main thing that needed to be dealt with was adapting the algorithm to build an immutable, rather than a mutable, vector. This was further complicated by the fact that Clojure's general loop/recur looping construct doesn't exactly match up with Python's for loop construct and a conversion process is needed. But the final result captured the spirit of the Python code well. Coming from Scheme was easier, and the only real modification that was necessary was to output a lazy list rather than a strict list. Both versions can take any collection as an input, but one produces vectors and the other produces lazy lists. Both implementations are legitimate choices, depending on the desired usage.