Serializing and Deserializing Clojure Fns with Nippy

We use Nippy at Red Planet Labs whenever we need general purpose serialization / deserialization of Clojure objects. It’s easy to use, fast, and covers virtually everything you could ever want to serialize in a Clojure program.

One notable exception is Clojure fns. There are several places in the distributed computing application we’re developing where storing or transmitting a serialized fn is really the best option. It’s worth talking about what we mean when we say serialized fn. We aren’t looking to serialize and distribute code — we already have a good mechanism for that. (It’s called a jar.) Instead, what we want to do is transmit fn instances between processes as a way to capture state and intention. Clojure already provides the primitives needed to treat a fn declared in a lexical context as an object; we just want to extend that so it works across process boundaries.

To be clear, this isn’t exactly a deficiency in Nippy. There are a lot of good reasons why you should never do this! The semantics of serializing a fn instance at one point in time and deserializing it again later, possibly in another process or even on another machine, leaves a lot of opportunities for mistakes to be made. Suffice it to say that your use cases should make it easy to ensure that fn implementations — the actual code — are always available when and where you deserialize an instance, and that they remain consistent across time. In our case, these characteristics are relatively easy to guarantee, so we can use this strategy safely.

One of the best things about Nippy is how easy it is to extend via the extend-freeze and extend-thaw APIs. Using this extension mechanism, we added transparent support for serializing and deserializing fn instances. Once you require the com.rpl.nippy-serializable-fn namespace, nippy/freeze! and nippy/thaw! will handle fn instances, either at the top level or nested within other objects, without any additional syntax:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

(ns example
(:require [taoensso.nippy :as nippy]
[com.rpl.nippy-serializable-fn]))

(defn my-fn [a b c]
(+ a b c))

(def thawed-myfn
(nippy/thaw! (nippy/freeze! my-fn)))
(thawed-myfn 1 2 3) ; 6

(let [x 10
afn (fn [a b c] (+ a b c x))
fn-bytes (nippy/freeze! afn)
thawed-fn (nippy/thaw! fn-bytes)]
(thawed-fn 1 2 3) ; 16
)

And that’s it! In our experiments, performance of freezing and thawing fns was comparable to freezing and thawing Clojure Records with similar amounts of “stuff” in them. You can find this extension on the Red Planet Labs GitHub, or add it as a dependency via Clojars.

A look under the hood

From the programmer perspective, there are (broadly) two “kinds” of fns in Clojure: declared fns (via defn) and anonymous fns (via (fn [] …)). But from an implementation perspective, what we really end up caring about is two different factors: whether the fn has anything in its closure, and whether it has metadata associated. At the highest level, to serialize a fn, we need to capture the “type” of it, anything in the closure, and any associated metadata. If we have all those things serialized in the byte stream, then we can reconstruct that fn instance at a later point in time.

Let’s take a quick side track into how Clojure actually implements fns. At compile time, Clojure generates a Java class for each fn. The most obvious part of this class is the invoke method, but there’s also a constructor, and possibly some fields (more on these later). When you interact with a fn instance in regular Clojure code, you are interacting with an instance of this generated Java class. When you invoke a Clojure fn instance, you are literally calling the invoke method with the provided arguments.

When a fn instance has nothing in it’s closure, then it’s really “static” in the Java sense of the term. Invocation relies only on arguments to the invoke method, not on any additional context. Whether you’re dealing with a declared fn or an anonymous fn, all you need to serialize in order to capture the essence of this kind of fn is the generated Java class name. It can be deserialized just by reading the name out of the byte stream and instantiating the fn by class name. We go a little farther here and make sure that when you deserialize a declared fn, you actually get the singleton instance stored in the named global var.

What about when you have a fn with a lexical closure? This is where the generated constructor and fields come into play. The Clojure compiler determines referenced local vars at compile time, and then adds a constructor argument and a corresponding protected field for each one. When the resulting fn is instantiated at runtime, the constructor is called with the values of local vars, capturing them for future invocation. Their invoke method implementation refers to both the invoking arguments and the constructing arguments, providing all the needed context to complete computation.

To serialize fns with a closure, we need to serialize the type along with all the values stored in the protected fields. Luckily, detecting fields used for storing closure values is easy and reliable, so all we need to do is get ahold of those values and recursively serialize them. The tricky part is that, naively, the only way to do this is via Java Reflection, which is a big performance no-no. The good news is that the JVM has continued to improve in this department, and now offers a much more sophisticated and high performance API for doing “reflective” operations called MethodHandles. A full discussion of MethodHandles is outside the scope of this post, but the key detail is that a MethodHandle has virtually identical performance to regular method invocation or direct field access, but can be constructed from Java Reflection primitives. We use this combination of APIs to dynamically analyze a fn instance’s closure fields and then generate an efficient serializer function, so the Java Reflection costs are only paid once per unique fn type encountered.

The other top-level case we mentioned above is fns with metadata. This is something most Clojure programmers encounter primarily through protocol fns. The implementation of these fns is different yet again — all fns with metadata are an instance of a special wrapper class “clojure.lang.AFunction$1”, which does nothing except hold a metadata map alongside the actual fn instance, and dispatches invocations down to the actual fn. When we try to freeze a fn like this, all we have to do is serialize the metadata and the unwrapped fn in sequence using Nippy and we’re done!

3 thoughts on “Serializing and Deserializing Clojure Fns with Nippy”

the-alchemist says:

January 7, 2020 at 1:27 pm

Very cool, can’t wait to try it!

Aaron D says:

January 10, 2020 at 11:58 pm

Very cool – thanks for releasing the code & for the write up. I assume you’re deserializing back into the self-same JVM/loaded ns that you serialized from (for the the anonymous fns) as the class names that Clojure generates for anonymous functions are non-deterministic – or are you able to send anon. fns over the wire somehow?

1. bryanduxbury says:
  
  January 11, 2020 at 9:21 am
  
  Good question — we actually do deserialize in other JVMs, but we guarantee stable class names by using ahead-of-time compilation. All JVM instances that could end up deserializing a fn instance are required to be launched from the same precompiled jar.

A look under the hood

3 thoughts on “Serializing and Deserializing Clojure Fns with Nippy”

Leave a ReplyCancel reply

Discover more from Blog