Lazy Evaluation

By Michael Kay on June 28, 2015 at 11:15p.m.

We've seen some VM dumps recently that showed evidence of contention problems when multiple threads (created, for example, using <xsl:for-each> with the saxon:threads attribute) were attempting lazy evaluation of the same local variable. So I've been looking at the lazy evaluation code in Saxon to try and understand all the permutations of how it works. A blog posting is a good way to try and capture that understanding before I forget it all again. But I won't go into the extra complexities of parallel execution just yet: I'll come back to that at the end.

Lazy evaluation applies when a variable binding, for example "let $v := //x[@y=3]" isn't evaluated immediately when the variable declaration is encountered, but only when the variable is actually referenced. This is possible in functional languages because evaluating an expression has no side-effects, so it doesn't matter when (or how often) it is done. In some functional languages such as Scheme, lazy evaluation happens only if you explicitly request it. In others, such as Haskell, lazy evaluation is mandated by the language specification (which means that a variable can hold an infinite sequence, so long as you don't try to process its entire value). In XSLT and XQuery, lazy evaluation is entirely at the discretion of the compiler, and in this post I shall try to summarize how Saxon makes use of this freedom.

Internally, when a local variable is evaluated lazily, Saxon instead of putting the variable's value in the relevant slot on the stack, will instead put a data structure that contains all the information needed to evaluate the variable: that is, the expression itself, and any part of the evaluation context on which it depends. In Saxon this data structure is called a Closure. The terminology isn't quite right, because it's not quite the same thing as the closure of an inline function, but the concepts are closely related: in some languages, lazy evaluation is implemented by storing, as the value of the variable, not the variable's actual value, but a function which delivers that value when invoked, and the data needed by this function to achieve that task is correctly called a closure. (If higher-order functions had been available in Saxon a few years earlier, we might well have implemented lazy evaluation this way.)

We can distinguish two levels of lazy evaluation. We might use the term "deferred evaluation" to indicate that a variable is not evaluated until it is first referenced, and "incremental evaluation" to indicate that when it is referenced, it is only evaluated to the extent necessary. For example, if the first reference is the function call head($v), only the first item in the sequence $v will be evaluated; remaining items will only be evaluated if a subsequent reference to the variable requires them.

Lazy evaluation can apply to global variables, local variables, parameters of templates and functions, and return values from templates and functions. Saxon handles each case slightly differently.

We should mention some static optimizations which are not directly related to lazy evaluation, but are often confused with it. First, a variable that is never referenced is eliminated at compile-time, so its initializing expression is never evaluated at all. Secondly, a variable that is only referenced once, and where the reference is not in any kind of loop, is inlined: that is, the variable reference is replaced by the expression used to initialize the variable, and the variable itself is then eliminated. So when someone writes "let $x := /a/b/c return $x[d=3]", Saxon turns this into the expression "(/a/b/c)[d=3]". (Achieving this of course requires careful attention to the static and dynamic context, but we won't go into the details here.)

Another static optimization that interacts with variable evaluation is loop-lifting. If an expression within a looping construct (for example the content of xsl:for-each, or of a predicate, or the right-hand-side of the "/" operator) will have the same value for every iteration of the loop, then a new local variable bound to this expression is created outside the loop, and the original expression is replaced by a reference to the variable. In this situation we need to take care that the expression is not evaluated unless the loop is executed at least once (both to avoid wasted evaluation cost, and to give the right behaviour in the event that evaluating the expression fails with a dynamic error.) So lazy evaluation of such a variable becomes mandatory.

The combined effect of these static optimizations, together with lazy evaluation, is that the order of evaluation of expressions can be quite unintuitive. To enable users to understand what is going on when debugging, it is therefore normal for some of these rewrites to be suppressed if debugging or tracing are enabled.

For global variables, Saxon uses deferred evaluation but not incremental evaluation. A global variable is not evaluated until it is first referenced, but at that point it is completely evaluated, and the sequence representing its value is held in memory in its entirety.

For local variables, evaluation is generally both deferred and incremental. However, the rules are quite complex.

If none of these special circumstances apply, lazy evaluation is chosen. There is one more choice to be made: between a Closure and a MemoClosure. The common case is a MemoClosure, and in this case, as the variable is incrementally evaluated, the value is saved for use when evaluating subsequent variable references. A (non-memo) closure is used when it is known that the value will only be needed once. Because most such cases have been handled by variable inlining, the main case where a non-memo closure is used is for the return value of a function. Functions, like variables, are lazily evaluated, so that the value returned to the caller is not actually a sequence in memory, but a closure containing all the information needed to materialize the sequence. (Like most rules in this story, there is an important exception: tail-call optimization, where the last thing a function does is to call itself, takes precedence over lazy evaluation).

So let's look more closely at the MemoClosure. A MemoClosure is a data structure that holds the following information:

Many variable references, for example count($v), or index-of($v, 'z') result in the variable being evaluated in full. If this is the first reference to the variable, that is if the state is UNREAD, the logic is essentially

inputIterator = expression.iterate(savedContext);

for item in inputIterator {
state = ALL_READ;
return new SequenceExtent(reservoir);

(However, Saxon doesn't optimize this case, and it occurs to me on writing this that it could.)

Other variable references, such as head($v), or $v[1], or subsequence($v, 1, 5), require only partial evaluation of the expression. In such cases Saxon creates and returns a ProgressiveIterator, and the requesting expression reads as many items from the ProgressiveIterator as it needs. Requests to get items from the ProgressiveIterator fetch items from the reservoir to the extent they are available; on exhaustion of the reservoir, they then attempt to fetch items from the InputIterator until either enough items are available, or the InputIterator is exhausted. Items delivered from the InputIterator are copied to the reservoir as they are found.

So far so good. This has all been in place for years, and works well. We have no evidence that it is in any way optimal, but it has been carefully tweaked over the years to deal with particular cases where it was performing badly. What has changed recently is that local variables can be referenced from multiple threads. There are two particular cases where this happens today: when xsl:result-document is used in Saxon-EE, it executes by default asynchronously in a new thread; and when the extension attribute saxon:threads is used on xsl:for-each, the items selected by the xsl:for-each are processed in parallel rather than sequentially.

The effect of this is that the MemoClosure object needs to be thread-safe: multiple requests to access the variable can come simultaneously from different threads. To achieve this a number of methods are synchronized. One of these is the next() method of the ProgressiveIterator: if two threads reference the variable at the same time, each gets its own ProgressiveIterator, and the next() method on one of these iterators is forced to wait until the other has finished.

This works, but it is risky. Brian Goetz in his excellent book Java Concurrency in Practice recommends that a method should not be synchronized unless (a) its execution time is short, and (b) as the author of the method, you know exactly what code will execute while it is active. In this case neither condition is satisfied. The next() method of ProgressiveIterator calls the next() method of the InputIterator, and this may perform expensive computation, for example retrieving and parsing a document using the doc() function. Further, we have no way of analyzing exactly what code is executed: in the worst case, it may include user-written code (for example, an extension function or a URIResolver). The mechanism can't deadlock with itself (because there cannot be a cycle of variable references) but it is practically impossible to prove that it can't deadlock with other subsystems that use synchronization, and in the face of maliciously-written used code, it's probably safe to assume that deadlock can occur. We haven't seen deadlock happen in practice, but it's unsatisfactory that we can't prove its impossibility.

So what should we do about it?

I think the answer is, add yet another exception to the list of cases where lazy evaluation is used: specifically, don't use it for a variable that can be referenced from a different thread. I'm pretty sure it's possible to detect such cases statically, and they won't be very common. In such cases, use eager evaluation instead.

We must be careful not to do this in the case of a loop-lifted variable, where the correct error semantics depend on lazy evaluation. So another tweak to the rules is, don't loop-lift code out of a multithreaded execution block.

This investigation also suggests a few other refinements we might make.