Saturday, February 25, 2012

Vars and Calling Methods

Vars in clojure are a simple way to name some information. We create vars with the def special form:

(def answer 42)

Now, a clojure beginner may think that we've simply assigned the name "answer" to the number 42. However, this is not exactly true. Instead, we've create a var or, more simply a "box", into which we've put the number 42, and given this box the name "answer".

Initially, translating this to Python seems quite straight forward: we simply vars into the globals dictionary in a python module, then we can refer to them by name. And, this is exactly the way we go about it in our current compiler.

However, this presents a problem when we start talking about two other ideas behind vars: consistent referencing, and binding:

Consistent referencing:
There is a small bug in our current clojure-py implementation. Let's say we define answer in a module called "a" and then import this module into a different module called "b". The problem that we've created, is that we now have a inconsistent way of referencing this value. If module "a" ever redefines the answer variable, b will never see these changes. We've copied the value, instead of copying the reference to this value.

In clojure-jvm, this bug is not shown, since all variables are stored into "Var" objects. These objects are copied and exported to other namespaces, then the clojure compile automatically dereferences these vars whenever the var's data is accessed. So this code:

(def answer-plus-one (inc answer))

Is really compiled by the compiler into something much more like this:

(def answer-plus-one ((.deref inc) (.deref answer)))

We could do this in python, and this might be a possible answer...but let's look at the next problem:

Bindings:
Clojure allows us, on a per-thread basis,  to re-assign the value of certain vars;

(def p (binding [answer 0] answer))
;; p = 0

The way this is acceived in clojure is by defining the var's deref function as something similar to this:

(.deref [this]
    (if (bindings-exist)
         (let [v (get bindings this-var-name)]
                (if v v this-var-value))
         this-var-value))

Ouch! so when we have a binding, we're forced to do a dictionary look-up? That sounds painfully slow. But perhaps there's ways to optimize this. Firstly, we must realize that vars don't often change. Most vars are "static" and should never be mutated, in fact, it may be possible to restrict them to never change....ever, unless defined to be "dynamic" vars. Secondly, we don't have to implement bindings as dictionaries, instead we could implement them as dynamic objects and use getattr/setattr on them. In PyPy using objects instead of dicts for situations like this could be much, much faster. So...how about some performance metrics:


In this example, we'll have some code count from 1 to 40 million, and to this 100 times. We'll be looking at different ways of supplying the + function to the loop and see what results in the fastest times:

(def highend 40000000)
(def interations 10)

(defn our-inc [x]
    (py.bytecode/BINARY_ADD x 1))



Method 1 - Global defs
Here we take the Python way and simply pass in the variables as if this was pure Python code. Since this is the way the clojure-py compiler currently acts, this will be a good baseline.


(defn load-global-test [highend]
    (loop [x 0]
        (if (< x highend)
            (recur (our-inc x)))))


(print "load-global-test")
(time (dotimes [x interations] (load-global-test highend)))

Result: 6053.08794975 msecs



Method 2 - Local defs
What if we passed these values in as arguments, then referred to them as local variables, instead of global variables?

(defn with-local-test [f highend]
    (loop [x 0]
        (if (< x highend)
            (recur (f x)))))

(print "with-local-test")
(time (dotimes [x interations] (with-local-test our-inc highend)))

Result: 6053.04908752 msecs

Well that did all of nothing. Let's try something else.

Method 3 - Vars without bindings
In this case, let's box up the function into a var and call it with .deref the way clojure does:

(deftype V [itm]
    (deref [self] itm))

(def VInc (V our-inc))

(defn with-var-test [v highend]
    (loop [x 0]
        (if (< x highend)
            (recur ((.deref v) x)))))

(print "with-var-test")
(time (dotimes [x interations] (with-var-test VInc highend)))

Result:  8084.71488953 msecs


Method 4 - Vars with bindings
So if vars aren't much slower, what sort of hit to we take by using bindings? This will basically be putting the variable into a object that is put into a var. A bit ugly, but we need it if we are going to support the bindings form:

(deftype V2 [itm bindings]
    (deref [self] (.deref bindings)))

(def VInc2 (V2 nil (V our-inc)))

(defn with-binding-test [v highend]
    (loop [x 0]
        (if (< x highend)
            (recur ((.deref v) x)))))

(print "with-binding-test")

(time (dotimes [x interations] (with-binding-test VInc2 highend)))

Result:  14154.5939445 msecs

Using bindings is actually about 2x slower than not using bindings. We may want to be a bit careful how we go about defining dynamic vars then.

Method 5 - Methods as constants

If we say that non-dynamic vars will never change, why not just load their contents as constants via the bytecode "LOAD_CONST"?

(defn with-const-test [highend]
    (loop [x 0]
        (if (< x highend)
            (recur ((py.bytecode/LOAD_CONST (fn [i] (py.bytecode/BINARY_ADD i 1))) x)))))

(print "with-const-test")

(time (dotimes [x interations] (with-const-test highend)))

Result: 6053.23505402 msecs

Not much faster at all

Method 6 - Inlined functions
Finally, let's try to simply inline the call as a direct call to the bytecode BINARY_ADD

(defn with-inline-test [highend]
    (loop [x 0]
        (if (< x highend)
            (recur (py.bytecode/BINARY_ADD x 1)))))

(print "with-inline-test")

(time (dotimes [x interations] (with-inline-test highend)))

Result: 6048.96688461 msecs

Once again, practically no difference.

So why are some of the results so similar. Well first of all, I performed these benchmarks on PyPy. PyPy implements a tracing JIT, and by design tracing JITs tend to unroll loops and inline code. So, if the JIT automatically inlines all function calls, there's not much use inlining them ourselves.


What is the takeaway from this then? Quite simply, static vars should be handled as global variable, and dynamic vars should be boxed. The compiler rules will then be defined thusly:

1) any vars not marked as {:dynamic true} are considered static. They cannot be redefined (the compiler will try to enforce this), and cannot have bindings applied to them.
2) any vars marked as {:dynamic true} are considered dynamic. They can be redefined, and in general, will be about 30% slower than static vars. Once bindings are applied they will be over 100% slower than static vars

When compiling code the compiler will attempt to resolve symbols in this order:

1) the compiler will check to see if the symbol is a bytecode
2) the compiler will look for a global, if the global is found, and is not a var, a standard LOAD_GLOBAL will be used. If it is a var, the var will be loaded with LOAD_GLOBAL, then a LOAD_ATTR "deref" will be called on the var.
3) if no global is found, and the symbol is not a special form, an error will be thrown.









2 comments:

  1. Had you considered using cells? Python manages closures using "cell" objects, which simply hold a value and can be read or written via LOAD_DEREF, STORE_DEREF, etc. bytecodes.

    ReplyDelete
  2. I did briefly. But it doesn't work for two reasons:

    1) they can't be named with a namespace. That is, I need to know where a given var came from. If I refer to cons in the user namespace, I need it to say internally "I am clojure.core/cons".

    2) Is there a non-hackish way to create a cell in Python code? I supposed I could create some sort of factory function, but that gets a bit odd

    3) Vars are also thread-local variables...doing that with a cell gets a bit interesting.

    ReplyDelete