Wednesday, December 25, 2013

Frustrations with namespaces in Clojure

Frustrations with namespaces in Clojure

I have a long laundry list of things about Clojure I find irritating. That shouldn’t be too surprising; I have a similar list for every language that I have used to a serious degree. Most of the things on my list are relatively minor gripes and, overall, Clojure remains one of the most enjoyable programming languages I’ve worked with. Namespaces, however, are one of the aspects of Clojure that cause me the greatest pain.

Namespaces are Clojure’s tool for preventing name collisions between files. The most common way to organize your files is with a one-namespace-per-file policy, where the namespace matches the folder-file hierarchy. For example puzzler.sudoku.grid would be the namespace for the grid.clj file in a puzzler/sudoku directory in the classpath. A namespace can require other namespaces, which ensures that the code in those namespaces is loaded and accessible.

It’s a simple enough scheme, but in practice, I find there are several problems with namespaces, as implemented in Clojure.

No circular dependencies

Most programs start off small. At the beginning, it’s pretty easy to break the program cleanly into a few files with a clear, acyclic dependency hierarchy. But then your program grows. And when that happens, Clojure’s inability to handle namespaces with circular dependencies will cause you pain, as you struggle to refactor your entire codebase.

Here are the most common scenarios where I’ve got bitten by this:

  1. Namespace b depends on namespace a. I’m adding a function foo to a and realize that I would really benefit from using a function bar that already exists in b. What to do? One option is to move the function from b into a. If I do this, I have to make sure to also move over everything in b that bar uses in its implementation. Next, I need to check every one of my namespaces that are downstream from b and use any of the moved functions, and add a dependency on a, and update any references to b/bar (or other moved functions) to a/bar.

    Another option is to try to figure out if there’s a way to move portions of code from b and a into a new namespace c, where both b and a will now depend on c. If you do this, you might be able to get away with moving only the underlying private helper functions. If you can avoid moving your public functions, that at least saves you the hassle of changing all the downstream consumers of the API to point to the new locations.

    Keep in mind that none of the Clojure IDEs are sophisticated enough to help with intelligently handling these sorts of refactorings. The really sad part is that sometimes the hassle factor is so huge, I feel tempted to just copy and paste function bar over to a so it exists in both locations. It makes me want to curse at Clojure just for making me consider such a horrible thing.

  2. Records and protocols cause tremendous problems with the circular dependency restriction. Naturally, your records and protocols are something you want to use throughout your project. So it makes sense to have a namespace such as myproject.types to put these common records and protocols in one place. For efficiency, it is common to implement some of the protocols inline in the record definitions. However, implementing the protocols can at times be very complex, so in those inline implementations, we want to be able to call helper functions that handle the complex implementation details. Namespaces are all about organization, so it’s reasonable to want to implement those helper functions in another namespace. But therein lies the problem. myproject.types depends on myproject.protocol-implementation-details which in turn depends on myproject.types (for example, if you want to implement push on a Stack you need to be able to return a new Stack, so you need that Stack record in scope).

A number of Clojure programmers have responded to this constraint by simply keeping the bulk of their code in one monolithic namespace. It is possible to split one namespace across multiple files, but then you’ve lost out on one of the desirable properties of namespaces — to know where to look for the definition of a function from the namespace/function reference.

No standard mechanism for re-providing required dependencies

The need for this manifests in several contexts:

  1. Creating a public API relying on functions spread across other namespaces. The simplest technique is to just create new vars that refer to the ones in the other namespaces, for example, (def public-foo private/foo). The main problem is that public-foo metadata won’t match the metadata of private/foo.

    The best library for dealing with this is potemkin, which handles the details of migrating the metdata over to the new var. This is an important enough issue, there should be a standard solution built into Clojure.

  2. When dealing with a complex project built out of multiple libraries and namespaces, it’s easy to end up with massively long headers, listing out dozens of dependencies. We need some way to group related dependencies into something that can be conveniently required as a single entity.

    In the early days of Clojure, Konrad Hinsen did some interesting work building some ns libraries supporting the notion of “namespace cloning” and other related ideas. I don’t know whether these tools are maintained these days, but again, it would be great to see a solution incorporated into Clojure.

  3. In the discussion above about breaking up circular dependencies, I talked about how much of the pain revolves around needing to update all the references to a function that has been moved from one namespace to another. If there were a convenient way to leave behind a link to the new location, it wouldn’t be quite so painful. Again, potemkin is currently the way to deal with this.

No parameterized namespaces

My first two complaints shouldn’t come as any surprise to anyone who has built a complex project in Clojure. But this last point is more subtle.

To illustrate the point, let’s imagine your boss tasks you with writing a program to generate sudoku puzzles. As you code your sudoku generator, it’s obvious that certain numbers show up again and again. Remembering the rule-of-thumb “no magic constants”, you decide to create some definitions at the top of your namespace:

;; Constants
(def section-size 3)
(def symbols (range 1 10)) ; might want to use something other than numbers
(def grid-size (* section-size section-size)) ; derived constant

Imagine that these constants are used throughout the many functions in your sudoku namespace.

Later your boss comes along and says he wants you to generate a bunch of 4x4 sudokus, using letters instead of numbers. No problem, you just change the constants and recompile, right?

But now, let’s imagine that you need to do a task that involves a mixture of generating 9x9 and 4x4 sudokus. Suddenly, you have a major problem.

Probably most Clojurians would quickly rework the main constants as follows:

;; Constants
(def ^:dynamic section-size 3)
(def ^:dynamic symbols (range 1 10)) 

And what to do about the derived constant? Possibly you could use a symbol macro (see clojure.tools.macro), or maybe rework it into a function and change all uses of the constant to a function call.

(defn grid-size [] (* section-size section-size))
; Change all uses of grid-size to (grid-size)

Then, you use your generator to generate five of each type of puzzle as follows:

(concat
  (sudoku/generate-puzzles 5)
  (binding [sudoku/section-size 2, sudoku/symbols [\A \B \C \D]]
     (sudoku/generate-puzzles 5)))

The problem here is that binding almost always leads towards more pain. Inevitably, you forget that, for example, generate-puzzles returns a lazy sequence, so when the sequence is realized, the binding no longer holds, and the vars revert back to their defaults, so you just get ten 9x9 puzzles.

binding simply isn’t a reliable way to parameterize your namespace, i.e., to customize your namespace’s “global variables” that all the functions rely upon.

How do other languages deal with this?

Classes can be thought of as a very richly-featured namespace. Even if you don’t use inheritance and mutable state, classes make great namespaces. In an OO language, the sudoku namespace could be a class, with section-size and symbols as properties/fields set by the class constructor. The functions in the class all “see” those fields.

This means that you can think of objects as special instances of this parameterized namespace. For example, (ab)using a mixture of java and clojure syntax, imagine:

sudoku4 = new Sudoku(2, [\A \B \C \D])
sudoku9 = new Sudoku(3, (range 1 10))
(concat (sudoku4.generate-puzzles 5) (sudoku9.generate-puzzles 5))

There’s no good way to do this in Clojure. Really, your only option is to explicitly pass these parameters in and out of every function in your namespace. To do this, you’d need to change every single one of your functions to take an additional input of the form:

{section-size :section-size symbols :symbols :as grid-options}

and passing grid-options to every function call.

This also changes the calling convention for all downstream consumers of this API.

This is a disaster. The problem is that projects usually start off just like this, with a few “global definitions” that later turn into actual parameters as the project evolves. binding is not an adequate solution because it interacts poorly with laziness. Threading the values through the functions can be a major ordeal. We need some way to produce a namespace that is derived from another namespace with certain parameters set in a given way.

I’m not an ML expert, but I believe that ML functors are an example of parameterized namespaces in the functional language world. Perhaps Clojure could draw inspiration from this.

Summary

All of these issues have one thing in common: part of Clojure’s appeal is that it creates an environment where you can dive in and start creating without having every detail of your final design planned out. Clojure lets you start simply with small projects and evolve a more complex structure as necessary. However, the details of how namespaces work means that as your project evolves, I inevitably hit a point where new additions become painful to make due to extensive refactoring. I find that when this happens, I’m psychologically steered towards avoiding those new additions, and that’s no good.

In order to create an environment where projects scale more seamlessly from small to large, Clojure’s namespaces are desperately in need of richer features.

20 comments:

  1. Your thoughts are good, though I'll point out another approach to your first example. Zach Tellman has a macro called import-vars, of which he says:

    "Clojure namespaces conflate the layout of your code and your API. For larger libraries, this generally means that you either have large namespaces (e.g. clojure.core) or a large number of namespaces that have to be used in concert to accomplish non-trivial tasks (e.g. Ring). The former approach places an onus on the creator of the library; the various orthogonal pieces of his library all coexist, which can make it difficult to keep everything straight. The latter approach places an onus on the consumers of the library, forcing them to remember exactly what functionality resides where before they can actually use it. import-vars allows functions, macros, and values to be defined in one namespace, and exposed in another. This means that the structure of your code and the structure of your API can be decoupled."

    You can see how he uses it here:

    https://github.com/ztellman/aleph/blob/perf/src/aleph/netty.clj

    That namespace is simply made up of things imported from other namespaces.

    ReplyDelete
    Replies
    1. Yes, Zach Tellman's library is the "potemkin" option I refer to a couple times in my post.

      Delete
  2. I would disagree with your first point that the system is "simple enough".

    I'd say, it's hard! Try explaining the difference between names vs symbols vs vars vs namespaces vs values vs objects, or even between load vs require vs refer vs use. Let's not kid ourselves: this is a subtle system. As a result it permits complex interaction and subtle mistakes. Maybe the complexity is worth the power it offers, or maybe it's just a bit of a mess. I am not sure. But I am sure it is an impediment for beginners.

    Also, the situation with namespaces is compounded by the lack of mature tool support. Specifically, I think none of the IDEs or emacs modes support basic refactorings like safe renaming. But good names are essential since names are the most crucial piece of documentation, which should be able to evolve along with our understanding of the code. So if renaming is difficult, then names get stale and code gets obscure. I understand automated refactoring is generally difficult in dynamic languages, but I'd bet there could be more.

    ReplyDelete
    Replies
    1. As a newbie, I feel this same difficulty. My ns definitions are evolved as Frankensteins from past incarnations with a mess of uses and requires. However, I don't think it is a subtle system: the teaching/learning path is flawed.

      Configuration and environment is difficult to grasp because it's a just-in-time skill, where you only go further if you need to. Other skills are just-in-case, which are interesting to learn by themselves even if you don't need them. John Cook exposes it better: http://www.johndcook.com/blog/2010/03/03/just-in-case-versus-just-in-time/

      I feel the same difficulty with git, learning by trial-error-StackOverflow because no tutorial guides you thru a concrete use case that explores all functionalities, which would need to emulate some team organization. However, a namespaces exposition could be done with a project evolution that exemplified all tools and why they're needed, preferrably free of "foo" and "bar". To this moment I haven't found a comprehensive guide.

      Delete
    2. As a clojure noob, namespaces were the reason I very nearly gave up on the language. I ran into a brick wall when I transitioned from "copy line by line from a book/tutorial" projects to "find some libs that do what I want and write the glue code to make them do it" projects. Couldn't get a darn thing to compile, despite my best efforts to require all the files. Reminded me of past pains in embedded C land, fighting with header include path problems, except that _most_ C libs at lest give you one, or a couple 'master' headers that include the rest. If I hadn't discovered slamhound, and the fact that it "fixes" the namespace problems, I'd be back on ruby now. And the funny/sad thing is that I found it digging through else's project, not in any book or tutorial, despite the fact that without it I'd consider clojure to be borderline un-useable by a beginner.

      Delete
    3. Chiming in a little late here, I just found this article.

      I agree that IDE support is currently pretty lacking for Clojure. I'm the developer of Cursive and I'm working on that problem, but there's still quite a long way to go - the language really doesn't help tooling implementations much. Cursive does offer reasonable renaming of both local bindings and global vars as well as classes generated via defrecord/deftype, namespaces etc.

      I don't support move yet, unfortunately, but I'm planning to have it work much as Mark described it - select functions to move, get presented with a list of things they depend on, get a warning if the move results in a circular dependency, and so forth. That's relatively straightforward but there are some tricky corners - say you move something that calls a record constructor, then you need to move the record definition which means you need to rename the class, and so forth. But it's all very solvable, and the IntelliJ infrastructure is really great for this stuff.

      Delete
  3. It would be nice if clojure namespaces just held data (e.g. like node.js). It would make the language a lot easier to learn and might enable a nicer solution to the parameterisation problem where you just `assoc` your parameters onto the module and get back a new module. Thanks to structural sharing that should be just as efficient as the class based approach. But I don't know enough about clojure to know whether or not the new module's functions will see the new parameters or not? I guess they probably wouldn't.

    ReplyDelete
  4. What would a "parameterized namespace" be? Does any other language have them? If you are going to "parameterize" a namespace, you are asking for a function which creates namespaces, in other words, anonymous _name_spaces, which seems absurd. Your project would be better served by config/input files in this scenario. Just because you (def ...) your magic numbers, they are still magic numbers hard-coded behind the curtain so to speak.

    I agree circular dependencies are a hassle, but there are ways to design through and around them. Common wisdom also holds that circular depends = design flaw. One technique is to keep your protocols defined in separate namespaces from the records which implement them, and you can eliminate all your circular dependencies rising from modules that only ought to know the relevant protocol and do not need to construct records.

    ReplyDelete
  5. I don't think my parameterized namespace point was as clear as I'd hoped. Please look at this gist and see if it's any clearer:
    https://gist.github.com/Engelberg/8141352

    ReplyDelete
    Replies
    1. Hi Mark,

      Great article. Adds real meat to the discussion about namespaces, which is very much needed. Eloquently expressed points, and accurate information.

      I have some comments written in code:
      https://gist.github.com/timothypratley/8157438

      Where I argue that "you’d need to change every single one of your functions to take an additional input" is actually straightforward and beats the alternative. I do agree that this is a common usage trap and something worth discussing further. To me the solution is "Ok fine, roll up my sleeves and add that extra argument" and I much prefer doing that in order to preserve functions + data vs classes and objects.

      Regards,
      Timothy

      Delete
    2. Interesting observations. I think if there were better tooling for tracking down all uses of functions, it would be a little easier to "roll up ones sleeves". You do make it look easy, but this small example doesn't really capture the pain of finding all references from all namespaces. It also doesn't tackle the "contamination" aspect. If I ever decide in any function that I need to call one of these functions that takes a grid, then that function also needs to be rewritten to take a grid and all of its callers need to be changed, and so on. That kind of thing puts psychological pressure to avoid making certain useful changes to your code, because you're concerned about the work it will cause.

      I like your analysis of the "magic macro" version. I agree that that line of thinking seems appealing, but it's hard to figure out how to make it perfect and beneficial without downsides.

      Delete
    3. Hi Mark (and Timothy),

      If one of the goals is to parameterise existing code while minimising changes to the functions that depend on your vars, why not this as a starting point:

      https://gist.github.com/optevo/8384327

      This could be improved by changing the wrapping defn to a macro which returns all wrapped defns amongst other things..

      Delete
    4. I'm not sure how that would help. Inner defns still create a global var.

      Delete
    5. Hi Mark,

      To expand on my earlier comment, here's what I was thinking of with closures/macros to make lightweight OO-like namespaces with minimal disruption to existing code:

      https://gist.github.com/optevo/8449076

      It doesn't really matter that the defn creates a var as it isn't used. To avoid polluting the namespace with vars or the inner defns, the macros in the gist above could be modified to replace defn with an anonymous function definition or alternatively could be redefined to throw an error which states that they functions can only be called in the context of the use-instance macro.

      Regards,
      Richard

      Delete
  6. Hey Mark, first I'd like to thank you for not just saying "namespaces are hard" but instead expending considerable effort to lay out your pain points. It is truly valuable. A few thoughts...

    Circular dependencies
    #1 - personally, I like that circular dependencies force errors. In general I find that acyclic dependency graphs fall out of good factoring as a by-product and I rarely encounter them.
    #2 - is indeed a thing I have run into particularly when inlining protocol impls into records. I have found that factoring protocol impls away from records can break these cycles, but that is indeed unfortunate.

    
Redeclaring deps
    #1 - I don't really like this method of pulling vars from many ns'es together into a single api namespace - it seems unnecessarily complicated to me. I typically structure my APIs in the form of a namespace with functions over an SPI protocol (or no protocol if there's really one impl). If there are enough functions, then I would break it up into multiple namespaces.

    #2 - I agree that this is tedious and I would love to have more support, or conventions, or tooling for this.

    #3 - do not want (although I'd love to have magical tooling everywhere that fixed up those references for me)

    Parameterized namespaces
    - I don't agree with the design path that gets you into this problem in the first place. Once you start to parameterize your "constants", you are into the realm of configuration. Having tried many approaches to configuration and dependency injection, I have grown comfortable with the "no magic" solution of explicit config. To me, config is part of the dependency set for a function impl (just like other system resources). I'm going to write up one technique I use for this elsewhere.

    ReplyDelete
  7. Module systems are a large topic. Yes, ML's module system is one good place to start, when thinking about real module systems. OCaml in particular has already implemented many theoretically and practically useful extensions to the original module system: http://stackoverflow.com/questions/15584848/whats-the-difference-if-any-between-standard-mls-module-system-and-ocaml-mod

    In particular, higher-order modules, first-class modules, recursive modules. These things really come up in practice.

    Clojure made the same mistake as Haskell in not providing a real module system, only a primitive namespacing mechanism.

    But there is hope. There has been work on building on top of weak module systems, retrofitting as it were: http://plv.mpi-sws.org/backpack/

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Circular dependencies are a design flaw in any language. By having a circular dependency, you make it impossible to reuse the module. You essentially have to deal with them as a whole and thus might as well have just one bigger module.

    ReplyDelete
  10. You might not want parametrized namespaces. E.g the grid dimensions are a property of the sudoku you are making and not a shared property of all of the namespace's functions. Have a constructor that makes you a custom sudoku and have funtions deal with sudokus with non predefined dimensions and non predefined set of symbols.

    ReplyDelete