Wednesday, March 28, 2012

Problems Creating .pyc Files

Having clojure-py create .pyc files should greatly improve start-up times, hopefully bringing them into the 0.2 sec range. To start with, let me begin by explaining how python generates .pyc files.

pyc files are basically marshaled code objects. As such all data contained in code.co_consts must conform to the restrictions of marshal. This means that constants can only be ints, floats, lists, tuples, etc. No user classes are allowed! However, this is not a restriction in the bytecode compiler. For instance, examine the following clojure code:

(py/print '(1 2))

And similar code in Python:

print (1 2)

Here's how Python compiles the code:


And how clojure-py compiles the code:


So the idea is that Clojure builds the tuple every single time it is assigned to a variable. They have to do this because some objects like lists or dicts are mutable. So you actually want each invocation of the above code to return a completely different object. However, in clojure-py we can save thousands of clock cycles simply by creating the list/hashmap/vector at compile-time and then loading it in a single instruction.Sometimes when we know the function to invoke at compile-time we even load functions via LOAD_CONST, this makes our code much faster, but also has one major problem...the code is now non-marshallable.

So what about using Pickle? Well Pickle doesn't support code objects....we could extend Pickle (cPickle can't be extended in the way we need it to be), but that's going to be very slow.

One path I tried going down is to store all consts in globals. That is, we generate code to create const values and then store the results in globals, loading them with LOAD_GLOBAL instead of LOAD_CONST. This kindof worked, but when I was done, it was riddled with bugs, and the compiler had become way more complex. Compilers are already complex, making it worse seemed like a bad idea.

So, now I should write some performance notes. From my testing compiling core.clj is pretty evenly split between three modules:   -- reads lisp objects    -- creates bytecode from lisp objects    -- compiles bytecode to actual binary data that Python understands

Now, the output of lispreader can be quite easily run through cPickle. This is the "low hanging fruit" so to speak. With few tweaks, we can pickle our source code and completely remove from the equation.

Secondly I think I'll look into performance improvements in the compiler. There has to be some room for improvement there.

Finally, I'm not actually sure we'll ever be able to compile to .pyc files. And macros are a major reason. Imagine that we have a macro in a.clj and that macro is used in b.clj. If we change the macro, we want to trigger clojure-py to re-create the .pyc files for b.clj. This means we'd have to create some sort of dependency graph for every file, that tracks every module it's dependent on, and does automatic re-compilation.

So anyway, this is why we still don't have .pyc files...and why it's taking some time.

Tuesday, March 13, 2012

How to use protocols

A few nights ago I finally got protocol support up and running in clojure-py. Most of the source code for what we're about to discuss can be found in clojure/lang/

To start with, I should say that the meta-programming abilities of Python continue to astound me. I found this nifty method called __subclasses__() that resides in every type defined by Python. Using this, it's quite trivial to transfers entire hierarchies of classes and comes in quite handy for use with protocols.

So to begin with. Let's examine the following use-case. In clojure we have a interface called Seqable. This abstract class (or interface as we call them at my day job programming C#), is responsible for turning a given class into a sequence. For some classes like PersistentList, calling .seq() on them is basically a no-op. A PersistentList is a seq, so no action is needed. However for something like a PersistentVector, we actually need to generate an IndexedSeq and return this new object instead.

Having the Seqable interface works just fine for any classes we define within our program. But what about tuples, stirings, or unicode? It would be nice to be able to .seq() those. Well, the IndexedSeq will work just fine  as the actual implementation, but we actually need a function to dispatch on the input type and create the needed seq objects.

In the past, we defined seq in the module thusly:

def seq(obj):
    if instance(obj, Seqable):
        return obj.seq()
    elif instance(obj, tuple):
        return IndexedSeq(obj, 0)

The problem with this code is it's not extensible. If, later on, we want to extend seq with some other closed class, we're sunk. This is where protocols help.

Protocols consist of two types of classes:

Protocol - This class has pointers to the functions defined by the protocol, as well as a list of all classes that extend this protocol

ProtocolFn - this is an actual protocol function. It dispatches on the type of the first argument passed to it. has some quick-and-easy helper functions:

protocolFromType(ns, tp) - this function takes the typle passed in as tp and enumerates the public functions it defines (any function starting with _ is ignored). Then it creates a ProtocolFn for each and every method in the type, these functions are then added to the namespace passed in via ns. Finally this function tags the interface as being a protocol.

extendForAllSubclasses(tp) - if tp points to a interface that is tagged as a protocol, the protocol is automatically extended for tp and all subclasses of tp.

With these two functions it is then trivial to add seq support to clojure.core:

def _bootstrap_protocols():
    global protocols, seq
    from clojure.lang.protocol import protocolFromType, extendForAllSubclasses
    from clojure.lang.iseq import ISeq as iseq
    from clojure.lang.seqable import Seqable as seqable
    protocolFromType("clojure.protocols", seqable)

def _extendSeqableForManuals():
    from clojure.lang.indexableseq import create as createIndexableSeq
    from clojure.lang.persistentvector import PersistentVector
    protocols.seq.extendForTypes([tuple, type([]), str, unicode],  lambda obj: createIndexableSeq(obj))
    protocols.seq.extend(type(None), lambda x: None)

Later on, when we define LazySeq within clojure.core we simply have to have it inherit from Seqable, then call

(clojure.lang.protocol/extendForAllSubclasses clojure.lang.seqable/Seqable)

And any new classes (LazySeq included) will be added to the protocol. 

Now granted, this doesn't quite follow the way protcols are created on the JVM, but in the next few days I'll wrap this all up in macros so the two will be identical.

Wednesday, March 7, 2012

Thoughts on Protocols

In the past week or so, I've been researching how to implement protocols in Clojure-Py. In order to implement protocols, we need a efficient way to dispatch a function on a given type. A simple solution would be something like this:

(def impls {py/int print-int
                      py/float print-float
                      mytype print-mytype})

(defn print [x & args]
      (apply (get impls (py/type x)) x & args))

While this may look fairly simple, we have one problem. Our implementation will be quite slow on PyPy compared to calling a normal bound method (e.g. This is because PyPy assumes that we will not often modify a variable's type, and it optimizes class attributes accordingly. Therefore this would be much better:

(defn install-fn [type name fn]
     (setattr type (str "__proto__" name) fn))

(install-fn py/int "print" print-int)
(install-fn py/float "print" print-float)
(install-fn mytype "print" print-mytype)

(defn print [x & args]
     (apply  (.-__proto__print (py/type x)) x & args))

This method will be extremely fast, but has one flaw: we can't modify built-in classes. So the solution is to do a hybrid. We will attempt to modify the class with setattr, if that fails, we'll save the function to a hash-map. This way PyPy can optimize away for all our custom types, and everything else will have to hit the hashmap. From my benchmarks, this hybrid approach is, about 2x faster than using a straight hashmap.