Tuesday, April 10, 2012

Clojure-Py and Distributed Concurrency (Part 1)

As most users of Python are well aware, Python has a GIL. At a basic level, this means that no two bytecodes can be executing at a given time. When one thread is executing bytecodes, another thread cannot be accessing the internals of the VM. For a language that embraces concurrency at such a basic level, this can present a problem.

To start with, we should state that there is absolutely no problem with implementing Clojure's STMs primitives in Python, they simply won't be of much use. More correctly, they will be of use, they simply won't allow for much of a performance improvement over code that doesn't use STM.

So what is the grand plan for concurrency in clojure-py? In short, we plan on bringing the ideas behind Erlang into the clojure world. We plan on merging the idea of concurrency oriented programming with lisp. As an intro, I suggest that the reader watch this intro by one of the co-creators of Erlang, Joe Armstrong: http://www.infoq.com/presentations/erlang-software-for-a-concurrent-world

There are three main concepts behind Erlang:

Share-nothing light-weight processes
Message Passing
Fault Tolerance

---------

The Erlang VM implements some very light weight "green-threads". To properly understand this, the reader should understand that threads in Python are OS level threads. That is, the OS kernel allocates a separate stack for each thread, and these threads are quite heavy. On Linux, threads require around 4-8MB per thread...and no, Windows is not much better. In addition to this, switching threads requires a jump into and back out of kernel space. This context switch takes a rather high number of CPU cycles.

If we want to match the level of concurrency that Erlang provides, this simply won't do. We're not looking for hundreds of "threads"...we're looking at thousands. Even that is a bit conservative, we're looking for millions of threads. So how do we do this in Clojure-py? Well this is where generators come into play.

The best way, perhaps, to understand generators is to see them in action:


>>> def gen_test():
...  yield 1
...  yield 2
...  yield 3
...
>>> z = gen_test()
>>> z
<generator object gen_test at 0x022987D8>
>>> z.next()
1
>>> z.next()
2
>>> z.next()
3
>>> z.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

The use of the yield statement is what triggers the creation of a generator in Python. When the yield statement is called, the stack of the generator is saved onto the heap, and a pointer to this data is handed to the calling function. From there, invoking .next() on the generator runs to the next yield, or until the function terminates, when the generator throws a StopInteration exception to signal generator termination. By itself, this really does nothing for us. However, in Python 2.6 yield was converted from being a statement to being an expression. Since yield now returns a value, we can do something like this:


>>> def print_stuff():
...  while True:
...   x = yield "Ready"
...   if x == "Stop":
...    return
...   else:
...    print x
...
>>> z = print_stuff()
>>> z.next()
'Ready'
>>> z.send(1)
1
'Ready'
>>> z.send(1)
1
'Ready'
>>> z.send(4)
4
'Ready'
>>> z.send(5)
5
'Ready'
>>> z.send("Stop")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

This is what is called co-routines. Many readers may recognize this pattern as a form of cooperative multitasking. With the yield expression, it is very possible to quickly switch between co-routines without ever jumping back into kernel space, or taking the penalty of a context switch.

This is how we plan to solve the problem of concurrency in Clojure-Py. Using macros, we can construct some code like this:

(defn log-info [file]
   (loop [file (py/open "file.log" "w")]
            (receive [:log message] (do (.write file message)
                                                     (recur file))
                         [:stop] (die))))


Clojure-Py will introduce the receive function. Receive's bindings will follow the semantics of core.match and will actually be based on this library. Receive will generate a yield expression that takes the return value of yield and performs pattern matching on the message.

(defn log-hello-world []
  (let [file "file.log"
         pid (spawn log-info file)]
        (! pid "Hello")
        (! pid "World")
        (! pid :stop)))

(schedule log-hello-world)

This code creates a parent process, spawns a child log-info process, then sends information to it. The function ! is taken from Erlang (and may change in name at some point). Unlike stackless python, the scheduler will put messages into queues for each process. From there the processes are free to take from these queues at their leisure. Since all communication between processes happens in a message passing manner, it is every possible to introduce a bit of serialization and networking to allow processes to send across networks.

(spawn fnc & args) ; spawns a task inside this python interpreter
(spawn-on fnc & args) ; spawns a task on this machine, but in (possibly) a different interpreter
(spawn-on machine-name fnc & args) ; spawns a task on a different machine

Using these functions we can very effectively get around the GIL. Spawn-on with no machine argument will automatically load-balance the task between any child processes on the same physical box. Each process will have its own GIL, but these GILs will work independently of each other.

To sum up the first part of this series on distributed programming on Clojure-Py. We aim to use co-routines to allow for efficient distributed computing from within a Clojure environment. These generator based "tasklets" are very small and efficient. On a modern laptop, 1 million "tasklets" were created in about 3 seconds under Python 2.7. Through the use of core.match and serialization, it is possible to allow network transparency in the message passing framework.

Anyone interested in learning more about this should read up on Erlang a bit more. Clojure-Py's vision for concurrency oriented programming borrows heavily from Erlang.

In the next part we will discuss a bit more about how the scheduler will work, and what fault-tolerance will mean to the system.



9 comments:

  1. So processes are lighter than threads and each process has its own GIL? What are the advantages of threads then? There must be some trade-offs between the two 8)

    ReplyDelete
  2. I should have clarified the terms a bit more. There are three types of threading involved here:

    1) OS Processes - Created via the OS, can have multiple threads
    2) OS Threads - multiple threads per process Python threads must all share the same GIL. So only one running thread per OS Processes
    3) Tasklets (or lightweight processes) - Python generators that require explicit yields to switch from one tasklet to the other.

    In the end, Clojure-Py will create the above structures in the following way:

    1) One OS Process per CPU on the system
    2) One Thread per tasklet scheduler.
    3) Unlimited tasklets per scheduler (they only require about 300 bytes each so use as many as you wish)

    Yes, there is a tradeoff. The tradeoff is that the programmer is required to explicitly yield control to switch tasklets for example:

    (defn bad-tasklet[pid]
    (recur)
    (! pid :oops))

    This tasklet will deadlock the OS Thread that runs the tasklet's scheduler. However, unlike OS threads and locks, this will not deadlock "sometimes" it will deadlock "every time". So you give up the power of automatic task switching, but gain the ability to have your programs be much, much more deterministic.

    ReplyDelete
  3. Only found out about the clojure-py project a few hours ago, very excited about it. I love the idea behind erlang but i don't like the syntax

    ReplyDelete
  4. When i first saw the clojre-py, the first thing i thought was "what about GIL?". That solution seems reasonable, i'm looking forward its implementation.

    ReplyDelete
  5. Perhaps you shouldn't discount STM so easily. Armin Rigo has been working on replacing PyPy's GIL with an STM (replace GIL acquire/release with transaction begin/end).

    It's not merged yet and may not be for a while, but already shows promising parallelism.

    ReplyDelete
  6. I've been following Armin's work very closely. Once it's actually in a more usable state, it will be trivial to extend Clojure-py's STM structures to support it. However, this will not help the large portion of Clojure-py users who cannot run PyPy for their projects. So, for them (and for any distributed app) the ideas behind Erlang offer many benefits.

    ReplyDelete
  7. May I suggest having a look at gevent for supporting concurrency? My experience is with eventlet, but gevent is faster and seems to have more community behind it. The advantage of gevent is that most of the yielding (for example when waiting on IO) is taken care of for you and it will give you many more tools to build your concurrency primatives with.

    Also if you're looking at making this truly distributed it might be worth thinking about the message passing piece now too. zeromq is my go to tool for this kind of thing, the main benefit for clojure-py being that you can use exactly the same semantics for passing a message inter-process, intra-process (same machine) and intra-machine.

    Some very knowledgeable folk have been building an erlang like framework for haskell that might yield some insight too. The wiki is here: https://github.com/haskell-distributed/distributed-process/wiki

    ReplyDelete
    Replies
    1. Sorry, I got my intra and inters mixed up there! Suffice to say that zeromq is awesome :-)

      Delete
  8. hi..is this a similar approach to f# agents:

    http://theburningmonk.com/2012/03/f-how-many-messages-can-you-post-to-a-f-agent-in-one-second/

    I've never used clojure agents but I wish know which are the differences between clojure agent and this...is something related to jvm limitation-advantages?

    ReplyDelete