Tuesday, December 24, 2013

Clojure vs Scala

Last week, someone posted a question on the Clojure group asking for a comparison between Clojure and Scala. Since my most popular blog post, by far, is my Racket vs Clojure post from three years ago, I thought it would be good to post my response here.

Ten years ago, I would have said that my ideal dream language is one that provides the flexibility to program in any style. I want to be able to choose object-oriented or functional, immutable or mutable, high-level abstractions or low-level speed. With respect to this ideal, Scala clearly wins, supporting more programming styles than Clojure.

I've definitely changed my tune, though, and now I actually prefer the way that Clojure constrains and shapes my thinking. I argue that even though Scala may provide me with more options on how to tackle a given programming problem, Clojure guides me towards a simpler solution. If this intrigues you, read on...

The following text is only slightly paraphrased from what I posted to the group:

All or nearly all of the functional aspects of Clojure have counterparts in Scala. On top of that, Scala provides mutable flavors of everything, so you can pick and choose your approach. So that makes Scala better, right?

But the difference between Clojure and Scala is bigger than a feature-to-feature comparison -- they have very different philosophies, and programs developed in Clojure consequently have a very different feel to them than those developed in Scala. I find Clojure programs to be dramatically simpler.

Just as one example, consider modeling a deck of cards. In Clojure, you'd be more likely to come up with a simple representation for a card, perhaps: [10 :spades]. Depending on the card game, you might choose to represent a face card as [:king :clubs] or [13 :clubs]. A deck would likely be modeled as just a sequence of cards, and all the built-in sequence functions would apply, for example, shuffle, take, drop, etc. Serializing the data (for example, if you want to keep a database tracking all the shuffled decks you've ever used in a given game) comes for free.

On the other hand, in Scala, you'd be more likely to create a card Class with a rank and suit field. The Suit class would be comprised of four case classes, because the philosophy is to enumerate all the possible suits as separate entities -- there's nothing in Scala like Clojure's convenient keywords. For the rank, you'd be steered towards representing all the ranks as integers. The possibility of representing face cards with a name would likely never occur to you, because it would be too complicated to go through the effort of defining the type of a rank to be a "integer or a class comprised of four case classes -- jack,queen,king,ace". For modeling the deck, you probably wouldn't say a Deck is-a sequence, because composition is favored over inheritance. So you'd probably have a Deck class which would contain a sequence of cards. This means that you'd have to reimplement methods like shuffle, take, and drop on your Deck class to turn around and dispatch those methods to the underlying sequence of cards. If you're not careful, years of object-oriented training might kick in and before you know it, you're representing the deck as a class where methods like shuffle, take, and drop destructively update the underlying sequence -- it feels so natural to do that once you've encapsulated the underlying sequence of cards in a class. If you want to serialize a deck, that's more code to write (although general "pickling" of a Scala object is an active area of research).

This example pretty much sums up what I prefer about Clojure. I like to tell people that a big part of what makes Clojure special is its philosophy of lightweight data modeling. It leads to delightfully simple systems. Scala remains deeply rooted in the OO philosophy, which all too often leads to an over-engineered muddle.

Further thoughts

After posting the above message, a couple people pointed out that Scala doesn't force you to build the more complicated model. That's absolutely true. But due to its highly detailed static type system, Scala attracts the kind of programmers that like to carefully categorize and enumerate all the possible data structures that will occur in their programs. Sure, you could eschew objects in Scala and mimic Clojure by using generic maps/vectors/lists/sets for all your structured data needs, but that's clearly not how Scala is meant to be used, and numerous little details of the language psychologically steer you towards developing a more rigorous type taxonomy.

In my post, I mentioned lightweight data modeling. I can't stress this term enough. I'd like to see it become the new catchphrase for Clojure. When I used to give my "elevator pitch" for Clojure, I'd talk about how it was a functional programming language, a dialect of Lisp on the JVM, with some interesting concurrency constructs. I'd get a lot of blank stares. Most people don't know what it means to be functional, and many don't know about Lisp. But once I started talking about lightweight data modeling, people's interest perked up. People get it, or think they get it, or at least get it enough to be curious to ask for more details.

At that point, I often say something like, "Do you know JSON?" After getting acknowledgment, I continue with, "Well, imagine if you could represent all your data as JSON, rather than a complex hierarchy of objects and methods, and the language was designed around making that kind of data super-easy to work with." I find I get a much more positive response from this kind of explanation. It gives a hint of what it feels like to work in Clojure and think in Clojure.

I find it interesting that in Scala's early days, Scala had a similar orientation. They proudly boasted that XML manipulation was going to be Scala's killer feature. You could drop XML in your code as a data literal; the language was oriented around making XML easy to work with. Now, this has fallen by the wayside. Martin Odersky (the designer of Scala) has been quoted as saying, "Seemed a great idea at the time, now it sticks out like a sore thumb."

I admit, there are times where I'm envious of Scala's versatility versus Clojure: the ease of using a mutable variable, the ease of working with Java primitives and arrays, the speed that comes from static typing, the richness of classes versus Clojure's namespaces. (Actually, this last point is probably worthy of its own blog post -- Clojure's namespaces are quite limited in ways that frequently cause me pain). Whenever I run up against one of these rough spots in Clojure, I feel like an ascetic monk, suffering because I've chosen to deny myself the additional tools that Scala brings to the table. But overall, I feel happier programming in Clojure because the additional constraints imposed by Clojure guide me more quickly towards a simple design.

23 comments:

  1. Years ago it was called "symbolic programming" and it's still as useful.

    ReplyDelete
  2. In Scala I would model a card as an Int and a card type either Face or King, Jack, Queen, Ace. I would use case classes giving pattern matching, Scala picking gives serialisation for free and easy and type checking for the serialisation at compile time.

    trait CardType
    case class FaceCard() extends CardType
    case class QueenCard() extends CardType
    case class... the other types continued... then
    case class Card(cardValue: Int, cardType: CardType)

    A deck of cards would be a single class which would contain a collection of cards, a List[Card], operations shuffle, etc, would be defined on the Deck class. Operations such as shuffle would produce a new immutable Card class instance, so there would be no mutable state, rely on the immutable List data structure.

    In particular since the Deck class is immutable you could expose the internal list publicly so people could do any of the take drop operations on the internal list and still be assured that the code is safe, pure functional and immutable. There is no need to hide the internal list implementation, so all list operations are available.

    I would say the main difference is that Scala doesn't have some features baked into the language that Clojure has, however, those features are available if you bring in the right libraries. So the language grows by convention, by the libraries, rather than having those things in at the beginning.
    It would be nice if those things were in at the beginning, but it would be a big job to put all of them in!

    A good example is Picking, which is new, but it gives easy and fast serialisation of Scala objects, this is not a part of the language but a library which uses Scala macros to do type safe serialisation, ie, it gets type checked at compile time, and some runtime checks.

    ReplyDelete
    Replies
    1. Just to show how you could use a deck, if you wanted to find all queens in a deck.

      val queenCardsInDeck = deck.cards.filter(_.cardType == QueenCard)

      To find cards with values < 5

      val valuesLessThanFive = deck.cards.filter(_.cardValue < 5)

      To pattern match.

      card match {
      case Card(_,QueenCard) => { println("This is a queen") }
      case Card(_,KingCard) => { println("This is a king") }
      }

      Delete
    2. Small correction "s such as shuffle would produce a new immutable Card class instance" should say "new immutable Deck class"

      Delete
    3. Phil, do I need to have Picking on the receiving side to deserialise the data? What if I use another language, are there any Picking-compatible deserialisers around? Are there any APIs that accept Picking-serialised data? Are there any databases that understand Picking serialisation format and can create efficient index to query the data?

      I hope you see my point. Type safety is great, but your application is usually a tiny piece of software in the much greater world that doesn't give a damn about your cool type system :)

      Delete
    4. Scala's [currently experimental] Pickling does allow you to change the output format easily, so there is an arbitrary number of "language-independent pickling-deserializers" around, depending on what output format the developer choose. (Clojure would of course run into the same kind of challenge when talking with other platforms. The author's point, as far as I gather, is that there is no such obfuscation of unfamiliarity *inside* a Clojure-application, when using a map to represent a deck of cards rather than a custom data type.)

      That being said, I do think phil's post goes some length to prove the points of this blog post. And THAT being said, I do love programming in Scala :)

      Delete
    5. (https://github.com/scala/pickling)

      Delete
    6. JSON test fixtures looks very promising, but there already are several limitations:

      1. Everything is wrapped with JSONPickle(...)
      2. Some JSON properties (e.g. "tpe", "$ref", "elems", "value") have special meanings, which means they can't be used for anything else.

      Although it is technically JSON data, it can't be used to talk to any Pickling-unaware API. It seems to be a fundamental limitation: you can't preserve all the semantics of Scala without loosing format generality. JSON and similar serialisation formats work as the lowest common denominator which everybody understands. Those who have richer semantics in their languages will be forced to somehow encode them to restore in future.

      At the moment Scala application I'm writing looks like series of type transformations JSON <-> JSON parser types <-> case classes <-> [Some application logic] <-> case classes <-> ORM types <-> DB native type system. App logic is slim compared to dances around types. Maybe it will change in future, but at the moment it looks like waste of time.

      Clojure's native data types are much closer to JSON and relational tuples, so less transformations are needed.

      Delete
    7. Yes, pickling is a specialized form of serialization+deserialization which lets you retain concrete instance types. This is the central idea about pickling and what differentiates it from serialization/deserialization. I agree that if you don't need this information, then you are better off using a serializer for your format of choice.

      If what you are saying is "Clojure has native data structures and literals for these, much like JavaScript" then yes, that's hard to disagree with :) But I don't agree that Clojure's data types are more similar to JavaScript's than, for example, Scala's.

      Deserializing JSON to Clojure data types would consist of the same steps as Scala, save for custom type providers which are not necessary/possible. Platform-specific data types such as Ratio, Vector/List etc would also be lost in the process, unless a Clojure-specific format (and matching "pickler") was used.

      Delete
    8. Alexander Zolotko and Mar - Picking serializes to JSON or Binary at this moment, currently I use it to serialize to JSON files.

      As the website says "can be Language-Neutral if you want it to be. Changing the format of your serialized data is as easy as importing the correct implicit pickle format into scope. Out of the box, we currently support a fast Scala binary format, as well as JSON. Support is currently planned for other formats. Or, you can even roll your own custom pickle format!".

      Delete
    9. Alexander, you can change the format of the JSON serialization very easily!

      Delete
  3. This is a great write up and echos conversations I have had with other scala and clojure developers. And as I read it I found myself agreeing with you that had I wrote this in scala I would be steered psychologically towards the OO approach. But as a counter example - what if I didn't like the shuffle algorithm provided by clojure sequences? (Disclaimer: I have never used clojure, just read about it here and there). What if I want to do other things when drop or take is called? The OO approach always seems like overkill plumbing if all you are doing is putting a facade over an existing datatype - but you're putting a contextual facade over that datatype - leaving out methods that don't apply to a deck of cards and allowing you to do more than just the underlying data types actions. Are there actions you can do on a sequence in clojure that don't apply to a deck of cards or that you wouldn't want someone to be able to do? Python has a similar mindset - don't keep the programmer from doing something dumb programmatically - its up to them to know what they're doing. Thats great for rapid development in a team thats all about the same skill level - but I think there are places where an OO approach is more applicable. What do you think?

    ReplyDelete
  4. > the speed that comes from static typing

    Is this really an advantage? I thought the speed advantage Scala programs sometimes exhibit over Clojure programs just comes from the reduced reluctance to use destructive updates.

    I haven't used static types on the JVM, but from talking to people who use it, they seem to care more about offering correctness guarantees than speed. Static types certainly get a speed boost over Groovy or JRuby programs which usually rely heavily on reflection, but this is rarely true (and easy to avoid) in Clojure programs.

    ReplyDelete
  5. You've got a very good point there, and to an extent I'll agree with you.

    But.

    There is at least one pitfall that goes along with what you call "lightweight data modelling" (great term, by the way): An excessive fascination with the representation of data, as opposed to its structure. It hit me hard a while back when I was reading Land of Lisp and found the first function from my blog post. So, I came up with a modified version of the program that uses the second function---they do exactly the same task.

    I realize that the second version is somewhat more verbose and that you don't have any context to understand what it's doing (which is not helped at all by the fact that I kept the same function names---they're fiddling with game search trees). But I submit that the second might be preferable anyway.

    A compulsion to over-model every thing in sight is one of the major sins of object-oriented programming, and I get to struggle with it quite a lot at work lately. And my personal feeling is that Scala's syncretism is rapidly heading towards Perl's write-only-ness---it certainly doesn't seem to be guiding anyone to the simplest solutions. But going too far the other way isn't going to be any better.

    ReplyDelete
  6. Actually, there _is_ something in Scala like clojure's convenient keywords: symbols. They just don't get much use (outside the compiler itself).

    ReplyDelete
    Replies
    1. They don't get used at all (that I can think of) in the compiler itself. The thing called "Symbol" in the compiler has no relationship.

      Delete
  7. You don't need to create a whole new class for a Deck. I would model it as:

    sealed trait Suit
    case object Spades extends Suit
    case object Clubs extends Suit
    case object Hearts extends Suit
    case object Diamonds extends Suit
    type Deck = List[(Int, Suit)]

    This construct, acting like a type alias in this case, makes it easy for you to change Deck to a class if needed later. You shouldn't need it though, as you can add any extra methods to your Deck through extension methods.

    ReplyDelete
    Replies
    1. Exactly.

      And if you google "scala deck of cards" and look at what people do, "over-modeling" doesn't seem to be an issue. (One solution appears to be inspired more by "C" than Java, but again, "over-modeled" it ain't).

      The scala community is definitely heterogenous but baroque object models are pretty generally frowned upon in my experience.

      Delete
  8. The problem is that both approaches have their pros and cons. The "clojure" approach is the dynamic approach. What a (good) python/perl/... programmer would do. The "scala" approach is the general approach a (good) Java/C++/... programmer would take. And you're right, for simple operations the dynamic approach would be best, simply because lots of operations are loosely defined because you're using very basic classes.

    However when the data becomes more complex, all these functions start to have side effects. There are plenty of lists you can't "take" from random places, shuffle may not be possible for the datastructure you've implemented. God forbid the list is a list that actually exists in a remote database.

    So the problem here is small versus large programs. The dynamic approach you describe is unbeatable for small programs, and a complete disaster for large ones. You implicitly apply a large, very-high-up class to your data and lots of stuff just works.

    When your program grows you're going to find that it doesn't always do the right thing and you need to either:

    1) reimplement the semantic equivalents of take, shuffle, and give them some weird name. Then live with the fact that anybody in your team can just call the wrong functions and you'd never find it.

    2) switch to the Scala static approach.

    ReplyDelete
    Replies
    1. Large programs almost always end up in complete disaster, compile times and the time tests take to run slowly creep up until maintenance and adding new features becomes a nightmare.

      Refactoring should be done at an application level as well to ensure programs stay small, it shouldn't matter which language they are written in.

      Better to keep programs small, pick the right language for the job, continually refactor, and focus on end to end tests so components can be easily replaced.

      Delete
  9. Integers would be a great starting point and they could represent everything that you need as long as you are OK with codifying the constraints in a function.

    ReplyDelete
  10. Great post however it tends to describe Scala as an OOP programming language which is not totally correct. The way to model the problem described for Scala is not the most popular way, of course if you come from a Java background this is what you tend to do but if you know ad hoc polimorfism, Monads etc... you would use a totally different approach, for example a deck would never be a new class but simply:
    type Deck = List[Cards]
    val deck:Deck = Deck(...)

    ReplyDelete