Tuesday, March 13, 2012

How to use protocols

A few nights ago I finally got protocol support up and running in clojure-py. Most of the source code for what we're about to discuss can be found in clojure/lang/protocol.py

To start with, I should say that the meta-programming abilities of Python continue to astound me. I found this nifty method called __subclasses__() that resides in every type defined by Python. Using this, it's quite trivial to transfers entire hierarchies of classes and comes in quite handy for use with protocols.

So to begin with. Let's examine the following use-case. In clojure we have a interface called Seqable. This abstract class (or interface as we call them at my day job programming C#), is responsible for turning a given class into a sequence. For some classes like PersistentList, calling .seq() on them is basically a no-op. A PersistentList is a seq, so no action is needed. However for something like a PersistentVector, we actually need to generate an IndexedSeq and return this new object instead.

Having the Seqable interface works just fine for any classes we define within our program. But what about tuples, stirings, or unicode? It would be nice to be able to .seq() those. Well, the IndexedSeq will work just fine  as the actual implementation, but we actually need a function to dispatch on the input type and create the needed seq objects.

In the past, we defined seq in the rt.py module thusly:

def seq(obj):
    if instance(obj, Seqable):
        return obj.seq()
    elif instance(obj, tuple):
        return IndexedSeq(obj, 0)

The problem with this code is it's not extensible. If, later on, we want to extend seq with some other closed class, we're sunk. This is where protocols help.

Protocols consist of two types of classes:

Protocol - This class has pointers to the functions defined by the protocol, as well as a list of all classes that extend this protocol

ProtocolFn - this is an actual protocol function. It dispatches on the type of the first argument passed to it.

protcol.py has some quick-and-easy helper functions:

protocolFromType(ns, tp) - this function takes the typle passed in as tp and enumerates the public functions it defines (any function starting with _ is ignored). Then it creates a ProtocolFn for each and every method in the type, these functions are then added to the namespace passed in via ns. Finally this function tags the interface as being a protocol.

extendForAllSubclasses(tp) - if tp points to a interface that is tagged as a protocol, the protocol is automatically extended for tp and all subclasses of tp.

With these two functions it is then trivial to add seq support to clojure.core:

def _bootstrap_protocols():
    global protocols, seq
    from clojure.lang.protocol import protocolFromType, extendForAllSubclasses
    from clojure.lang.iseq import ISeq as iseq
    from clojure.lang.seqable import Seqable as seqable
    protocolFromType("clojure.protocols", seqable)

def _extendSeqableForManuals():
    from clojure.lang.indexableseq import create as createIndexableSeq
    from clojure.lang.persistentvector import PersistentVector
    protocols.seq.extendForTypes([tuple, type([]), str, unicode],  lambda obj: createIndexableSeq(obj))
    protocols.seq.extend(type(None), lambda x: None)

Later on, when we define LazySeq within clojure.core we simply have to have it inherit from Seqable, then call

(clojure.lang.protocol/extendForAllSubclasses clojure.lang.seqable/Seqable)

And any new classes (LazySeq included) will be added to the protocol. 

Now granted, this doesn't quite follow the way protcols are created on the JVM, but in the next few days I'll wrap this all up in macros so the two will be identical.

No comments:

Post a Comment