Friday, April 27, 2012

IFn vs __call__

I'm going to coin a phrase: "Those who don't understand the work of Rich Hickey are doomed to reinvent it, poorly". I have found this to be true so many times, I should write it on a post-it note and stick it on my workstation.

Unlike the CLR, the Python VM is a very different beast from the Java VM. For instance, in Java, there is no way to make a arbitrary object act like a function. To code around this issue, Clojure implements IFn which looks something like this:

interface IFn
{
    object invoke();
    object invoke1(object a);
    object invoke2(object a, object b);
    object invoke3(object a, object b, object c);
    etc....
}

At compile-time Clojure then figures out which invoke to call, and dispatches accordingly

However, no such restrictions exist in the Python VM (hereafter the PVM). Instead, python defines the __call__ method:

class MyObject(object):
  def __call__(self, arg):
     return arg + 1

assert(MyObject()(1) == 2)

For quite some time now, Clojure-Py has handled multiply-arity functions thusly:

def foo(*args):
   if len(args) == 0:
      print "no arg"
   elif len(arg) == 2:
      print "two args"
   elif len(args) >= 3:
      print "more than two args"

The best aspect of handling multiple arity functions this way is that it's more "pythonic". In this way, Clojure functions are Python functions and act exactly the same. More recently however, I encountered a form like this while attempting to translate core.match to clojure-py:

(deftype Foo
     IFn
     (invoke [x y] (+ x y)))


Well that's no good. Either we can add some ugly compilation conditionals, move this deftype into a separate file that uses __call__ instead of invoke, or we can switch to using IFn in Clojure-Py as well.

The more I thought about it, the more the use of IFn makes sense. Consider this function

(defn sum
   ([] 0)
   ([x] x)
   ([x y] (+ x y))
   ([x y & more] (reduce sum (+ x y) more)))

If we call (sum 1 2), the VM executes the following:

1) wrap 1 and 2 into a tuple
2) call sum with the tuple as an argument
3) run the tuple through 3 length checks to find the appropriate block to execute
4) assign the local variables x and y to tuple[0] and tuple[1]

However, if we were to use IFn, the compiler would run the above code at compile time, and instead call sum.invoke2(1, 2) directly. This saves us a memory allocation (from the tuple creation), several if statements, and a few other checks. Well this should run much faster!

The reality is, from my testing it looks like switching to IFn will give us about a 10x speed up for Clojure-Py code running on PyPy. On CPython it won't make much of a difference one way or the other, but it will allow for cleaner porting from other Clojure-JVM programs. And we can still have our old compatibility with Python:

class MyFn(IFn):
     def invoke2(self, x, y):
         return x + y
     def __call__(self, x y):
         return self.invoke2(x, y)

But what about Python functions? We can't do tuple.invoke2(x, y)! The tuple function does't have a invoke2 method! In ClojureScript, clojure simply adds invoke to each and every function in the system. In PVM, we can't modify functions. We could infer the information at compile-time, but that's no good in the cases where function could be replaced at run-time.

Ah! So this is why Clojure-JVM never allows direct invocation of static methods. Instead users must use the . form. So, instead of doing:

(py/tuple 1 2)

we will now require users to use the standard Clojure form of:

(. py (tuple 1 2))

This tells the compiler to not try to use a invoke2 call, but instead to invoke tuple via a interop call.

Users could get a handle to a python function through several means, perhaps as a returned callback, or by doing a getattr call on a module. In these cases, users will have to use the wrap-fn function:

(let [x (wrap-fn (.-tuple py))]
      (x 1 2))

All this to say, I've found (once again) that making Clojure-Py conform tightly to the standard Clojure way of implementing functions not only simplifies porting Clojure-JVM code, but also often makes the resulting code much faster.

In the future, maybe I should research why Clojure was written the way it was before blindly fitting it into the Python mold. Now, where's that post-it note...there it is....*starts writing*


Wednesday, April 18, 2012

Clojure-Py and Distributed Concurrency (Part 2)

In the last part of this series on Clojure-Py and distributed concurrency we discussed how tasklets are created. In this post, we will discuss the scheduler that manages message passing between tasklets. In the following articles we will use process and tasklet interchangeably. When "process" is mentioned, we don't mean  a os process, but instead, a share-nothing python tasklet, based on a generator.

Using the python yield bytecode we are able to get data both into and out of tasklets. Whenever a tasklet yields it will return a python tuple in the following format:

(state-kw & args)

There are several states that each tasklet can take:

::spawn - creates a new tasklet
::send - sends a message to a  tasklet (via a pid)
::recv - receives a message from a tasklet
::become - keeps the same pid, but swaps out the function being executed by the process
::next-msg - the message just sent via ::recv was unhandled, re-queue it and try the next message
::die - kills the process
::yield - simply reschedules this tasklet for a new time-slice, useful for inside loops that would otherwise block

The scheduler then is simply a loop that iterates over all the tasklets. If a ::recv command was given last, then the scheduler looks for a message to hand to the tasklet. If the tasklet returns ::next-msg, then the scheduler instantly passes in the next message, then the next, then the next, until a different tasklet state is returned. This allows tasklets to search the message queue. If no message can be found then the tasklet is put into a sleep state until another process sends it a message to awaken the tasklet.

The ::become state is useful for creating what the Erlang people call "universal servers". For example:

(defn universal-server []
  (recv [fn args]
             (become fn args)))

A universal server is a server that simply listens for the first message it receives, retrieves a fn and args from that message, then transforms itself into a server that runs that fn. The astute reader will recognize this as a form of tail-call-optimization. If Clojure-Py had TCO, we could simply call (fn args) instead of (become fn args), but since Clojure-Py (like Clojure on the JVM) is limited by the VM it runs on, we must implement this using yield bytecodes.

From here, the other tasklet states should be rather clear. The scheduler is really nothing more than a loop that manages the mini state machines we know as tasklets. Putting it all together:

(defn crazy-server []
   (receive [:ping] (! spid :pong)     ; spid is implicitly supplied by receive, yields (::send spid :pong)
                [:clone] (! spid (spawn crazy-server)) ; yields (::spawn crazy-server)
                [:become fn args] (become fn args)     ; yields (::become fn args)
                [:die] (die)))                                       ; yields (::die)

Normally there will be one scheduler for each running instance of the Python VM. However, if in the future, PyPy gets full STM support, multiple VMs could be started.

In the next part, we will discuss fault tolerance, network communication, and OS process management.


Tuesday, April 10, 2012

Clojure-Py and Distributed Concurrency (Part 1)

As most users of Python are well aware, Python has a GIL. At a basic level, this means that no two bytecodes can be executing at a given time. When one thread is executing bytecodes, another thread cannot be accessing the internals of the VM. For a language that embraces concurrency at such a basic level, this can present a problem.

To start with, we should state that there is absolutely no problem with implementing Clojure's STMs primitives in Python, they simply won't be of much use. More correctly, they will be of use, they simply won't allow for much of a performance improvement over code that doesn't use STM.

So what is the grand plan for concurrency in clojure-py? In short, we plan on bringing the ideas behind Erlang into the clojure world. We plan on merging the idea of concurrency oriented programming with lisp. As an intro, I suggest that the reader watch this intro by one of the co-creators of Erlang, Joe Armstrong: http://www.infoq.com/presentations/erlang-software-for-a-concurrent-world

There are three main concepts behind Erlang:

Share-nothing light-weight processes
Message Passing
Fault Tolerance

---------

The Erlang VM implements some very light weight "green-threads". To properly understand this, the reader should understand that threads in Python are OS level threads. That is, the OS kernel allocates a separate stack for each thread, and these threads are quite heavy. On Linux, threads require around 4-8MB per thread...and no, Windows is not much better. In addition to this, switching threads requires a jump into and back out of kernel space. This context switch takes a rather high number of CPU cycles.

If we want to match the level of concurrency that Erlang provides, this simply won't do. We're not looking for hundreds of "threads"...we're looking at thousands. Even that is a bit conservative, we're looking for millions of threads. So how do we do this in Clojure-py? Well this is where generators come into play.

The best way, perhaps, to understand generators is to see them in action:


>>> def gen_test():
...  yield 1
...  yield 2
...  yield 3
...
>>> z = gen_test()
>>> z
<generator object gen_test at 0x022987D8>
>>> z.next()
1
>>> z.next()
2
>>> z.next()
3
>>> z.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

The use of the yield statement is what triggers the creation of a generator in Python. When the yield statement is called, the stack of the generator is saved onto the heap, and a pointer to this data is handed to the calling function. From there, invoking .next() on the generator runs to the next yield, or until the function terminates, when the generator throws a StopInteration exception to signal generator termination. By itself, this really does nothing for us. However, in Python 2.6 yield was converted from being a statement to being an expression. Since yield now returns a value, we can do something like this:


>>> def print_stuff():
...  while True:
...   x = yield "Ready"
...   if x == "Stop":
...    return
...   else:
...    print x
...
>>> z = print_stuff()
>>> z.next()
'Ready'
>>> z.send(1)
1
'Ready'
>>> z.send(1)
1
'Ready'
>>> z.send(4)
4
'Ready'
>>> z.send(5)
5
'Ready'
>>> z.send("Stop")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

This is what is called co-routines. Many readers may recognize this pattern as a form of cooperative multitasking. With the yield expression, it is very possible to quickly switch between co-routines without ever jumping back into kernel space, or taking the penalty of a context switch.

This is how we plan to solve the problem of concurrency in Clojure-Py. Using macros, we can construct some code like this:

(defn log-info [file]
   (loop [file (py/open "file.log" "w")]
            (receive [:log message] (do (.write file message)
                                                     (recur file))
                         [:stop] (die))))


Clojure-Py will introduce the receive function. Receive's bindings will follow the semantics of core.match and will actually be based on this library. Receive will generate a yield expression that takes the return value of yield and performs pattern matching on the message.

(defn log-hello-world []
  (let [file "file.log"
         pid (spawn log-info file)]
        (! pid "Hello")
        (! pid "World")
        (! pid :stop)))

(schedule log-hello-world)

This code creates a parent process, spawns a child log-info process, then sends information to it. The function ! is taken from Erlang (and may change in name at some point). Unlike stackless python, the scheduler will put messages into queues for each process. From there the processes are free to take from these queues at their leisure. Since all communication between processes happens in a message passing manner, it is every possible to introduce a bit of serialization and networking to allow processes to send across networks.

(spawn fnc & args) ; spawns a task inside this python interpreter
(spawn-on fnc & args) ; spawns a task on this machine, but in (possibly) a different interpreter
(spawn-on machine-name fnc & args) ; spawns a task on a different machine

Using these functions we can very effectively get around the GIL. Spawn-on with no machine argument will automatically load-balance the task between any child processes on the same physical box. Each process will have its own GIL, but these GILs will work independently of each other.

To sum up the first part of this series on distributed programming on Clojure-Py. We aim to use co-routines to allow for efficient distributed computing from within a Clojure environment. These generator based "tasklets" are very small and efficient. On a modern laptop, 1 million "tasklets" were created in about 3 seconds under Python 2.7. Through the use of core.match and serialization, it is possible to allow network transparency in the message passing framework.

Anyone interested in learning more about this should read up on Erlang a bit more. Clojure-Py's vision for concurrency oriented programming borrows heavily from Erlang.

In the next part we will discuss a bit more about how the scheduler will work, and what fault-tolerance will mean to the system.



Clojure-Py 0.2 Released!

We are proud to announce the release of Clojure-Py 0.2.


There are numerous changes in this version, a few highlights:

* implemented deftype with protocol support
* implemented defprotocol and extend
* implemented reify
* implemented defrecord
* binding is supported
* implemented defmulti and defmethod
* support for derive, supers, bases, etc.
* try/catch, try/finally, with-open
* python generators are seq'able (better interop)
* over 290 tests are implemented
* countless bug fixes

Known issues:

There is a bug in the compiler currently that does not support
try/catch/finally blocks. try/catch and try/finally work just fine,
it's only the combination of these three. This will be fixed in the
next release.

Source:

https://github.com/halgari/clojure-py

Also via easy_install:

easy_install clojure-py

Roadmap:

version 0.3 -
  Greatly improved compiler re-written for speed, and code simplicity.
  Python 3 support
  Full implementation of clojure.core (including STM features)

version 1.0 -
  Fault-tolerant, distributed programming via Erlang style
concurrency. This will allow us to "code around the GIL" and at the
same time have distributed capabilities not seen in any other Clojure
dialect.

Thanks to the many developers who have been working on this project
the past few weeks. It's exciting to see the clojure-py community
continue to grow!