Friday, April 27, 2012

IFn vs __call__

I'm going to coin a phrase: "Those who don't understand the work of Rich Hickey are doomed to reinvent it, poorly". I have found this to be true so many times, I should write it on a post-it note and stick it on my workstation.

Unlike the CLR, the Python VM is a very different beast from the Java VM. For instance, in Java, there is no way to make a arbitrary object act like a function. To code around this issue, Clojure implements IFn which looks something like this:

interface IFn
    object invoke();
    object invoke1(object a);
    object invoke2(object a, object b);
    object invoke3(object a, object b, object c);

At compile-time Clojure then figures out which invoke to call, and dispatches accordingly

However, no such restrictions exist in the Python VM (hereafter the PVM). Instead, python defines the __call__ method:

class MyObject(object):
  def __call__(self, arg):
     return arg + 1

assert(MyObject()(1) == 2)

For quite some time now, Clojure-Py has handled multiply-arity functions thusly:

def foo(*args):
   if len(args) == 0:
      print "no arg"
   elif len(arg) == 2:
      print "two args"
   elif len(args) >= 3:
      print "more than two args"

The best aspect of handling multiple arity functions this way is that it's more "pythonic". In this way, Clojure functions are Python functions and act exactly the same. More recently however, I encountered a form like this while attempting to translate core.match to clojure-py:

(deftype Foo
     (invoke [x y] (+ x y)))

Well that's no good. Either we can add some ugly compilation conditionals, move this deftype into a separate file that uses __call__ instead of invoke, or we can switch to using IFn in Clojure-Py as well.

The more I thought about it, the more the use of IFn makes sense. Consider this function

(defn sum
   ([] 0)
   ([x] x)
   ([x y] (+ x y))
   ([x y & more] (reduce sum (+ x y) more)))

If we call (sum 1 2), the VM executes the following:

1) wrap 1 and 2 into a tuple
2) call sum with the tuple as an argument
3) run the tuple through 3 length checks to find the appropriate block to execute
4) assign the local variables x and y to tuple[0] and tuple[1]

However, if we were to use IFn, the compiler would run the above code at compile time, and instead call sum.invoke2(1, 2) directly. This saves us a memory allocation (from the tuple creation), several if statements, and a few other checks. Well this should run much faster!

The reality is, from my testing it looks like switching to IFn will give us about a 10x speed up for Clojure-Py code running on PyPy. On CPython it won't make much of a difference one way or the other, but it will allow for cleaner porting from other Clojure-JVM programs. And we can still have our old compatibility with Python:

class MyFn(IFn):
     def invoke2(self, x, y):
         return x + y
     def __call__(self, x y):
         return self.invoke2(x, y)

But what about Python functions? We can't do tuple.invoke2(x, y)! The tuple function does't have a invoke2 method! In ClojureScript, clojure simply adds invoke to each and every function in the system. In PVM, we can't modify functions. We could infer the information at compile-time, but that's no good in the cases where function could be replaced at run-time.

Ah! So this is why Clojure-JVM never allows direct invocation of static methods. Instead users must use the . form. So, instead of doing:

(py/tuple 1 2)

we will now require users to use the standard Clojure form of:

(. py (tuple 1 2))

This tells the compiler to not try to use a invoke2 call, but instead to invoke tuple via a interop call.

Users could get a handle to a python function through several means, perhaps as a returned callback, or by doing a getattr call on a module. In these cases, users will have to use the wrap-fn function:

(let [x (wrap-fn (.-tuple py))]
      (x 1 2))

All this to say, I've found (once again) that making Clojure-Py conform tightly to the standard Clojure way of implementing functions not only simplifies porting Clojure-JVM code, but also often makes the resulting code much faster.

In the future, maybe I should research why Clojure was written the way it was before blindly fitting it into the Python mold. Now, where's that post-it note...there it is....*starts writing*


  1. «In the future, maybe I should research why Clojure was written the way it was before blindly fitting it into the Python mold. Now, where's that post-it note...there it is....*starts writing*»

    If only all this knowledge/wisdom was written and shared for all to learn from, as in literate programming !
    Instead, each of us is bound to try to second guess the creator ☹

  2. Is this project still alive? I would love to see the project make headway. Python is a much more preferable inter-op target than java!