Lazy Evaluation

We've seen some VM dumps recently that showed evidence of contention problems when multiple threads (created, for example, using <xsl:for-each> with the saxon:threads attribute) were attempting lazy evaluation of the same local variable. So I've been looking at the lazy evaluation code in Saxon to try and understand all the permutations of how it works. A blog posting is a good way to try and capture that understanding before I forget it all again. But I won't go into the extra complexities of parallel execution just yet: I'll come back to that at the end.

Lazy evaluation applies when a variable binding, for example "let $v := //x[@y=3]" isn't evaluated immediately when the variable declaration is encountered, but only when the variable is actually referenced. This is possible in functional languages because evaluating an expression has no side-effects, so it doesn't matter when (or how often) it is done. In some functional languages such as Scheme, lazy evaluation happens only if you explicitly request it. In others, such as Haskell, lazy evaluation is mandated by the language specification (which means that a variable can hold an infinite sequence, so long as you don't try to process its entire value). In XSLT and XQuery, lazy evaluation is entirely at the discretion of the compiler, and in this post I shall try to summarize how Saxon makes use of this freedom.

Internally, when a local variable is evaluated lazily, Saxon instead of putting the variable's value in the relevant slot on the stack, will instead put a data structure that contains all the information needed to evaluate the variable: that is, the expression itself, and any part of the evaluation context on which it depends. In Saxon this data structure is called a Closure. The terminology isn't quite right, because it's not quite the same thing as the closure of an inline function, but the concepts are closely related: in some languages, lazy evaluation is implemented by storing, as the value of the variable, not the variable's actual value, but a function which delivers that value when invoked, and the data needed by this function to achieve that task is correctly called a closure. (If higher-order functions had been available in Saxon a few years earlier, we might well have implemented lazy evaluation this way.)

We can distinguish two levels of lazy evaluation. We might use the term "deferred evaluation" to indicate that a variable is not evaluated until it is first referenced, and "incremental evaluation" to indicate that when it is referenced, it is only evaluated to the extent necessary. For example, if the first reference is the function call head($v), only the first item in the sequence $v will be evaluated; remaining items will only be evaluated if a subsequent reference to the variable requires them.

Lazy evaluation can apply to global variables, local variables, parameters of templates and functions, and return values from templates and functions. Saxon handles each case slightly differently.

We should mention some static optimizations which are not directly related to lazy evaluation, but are often confused with it. First, a variable that is never referenced is eliminated at compile-time, so its initializing expression is never evaluated at all. Secondly, a variable that is only referenced once, and where the reference is not in any kind of loop, is inlined: that is, the variable reference is replaced by the expression used to initialize the variable, and the variable itself is then eliminated. So when someone writes "let $x := /a/b/c return $x[d=3]", Saxon turns this into the expression "(/a/b/c)[d=3]". (Achieving this of course requires careful attention to the static and dynamic context, but we won't go into the details here.)

Another static optimization that interacts with variable evaluation is loop-lifting. If an expression within a looping construct (for example the content of xsl:for-each, or of a predicate, or the right-hand-side of the "/" operator) will have the same value for every iteration of the loop, then a new local variable bound to this expression is created outside the loop, and the original expression is replaced by a reference to the variable. In this situation we need to take care that the expression is not evaluated unless the loop is executed at least once (both to avoid wasted evaluation cost, and to give the right behaviour in the event that evaluating the expression fails with a dynamic error.) So lazy evaluation of such a variable becomes mandatory.

The combined effect of these static optimizations, together with lazy evaluation, is that the order of evaluation of expressions can be quite unintuitive. To enable users to understand what is going on when debugging, it is therefore normal for some of these rewrites to be suppressed if debugging or tracing are enabled.

For global variables, Saxon uses deferred evaluation but not incremental evaluation. A global variable is not evaluated until it is first referenced, but at that point it is completely evaluated, and the sequence representing its value is held in memory in its entirety.

For local variables, evaluation is generally both deferred and incremental. However, the rules are quite complex.

If the static type shows that the value will be a singleton, then it will be evaluated eagerly. [It's not at all clear that this rule makes sense. Certainly, incremental evaluation makes no sense for singletons. But deferred evaluation could still be very useful, for example if the evaluation is expensive and the variable is only referenced within a branch of a conditional, so the value is not always needed.]
Eager evaluation is used when the binding expression is very simple: in particular when it is a literal or a reference to another variable.
Eager evaluation is used for binding expressions that depend on position() or last(), to avoid the complexities of saving these values in the Closure.
There are some optimizations which take precedence over lazy evaluation. For example if there are variable references using predicates, such as $v[@x=3], then the variable will not only be evaluated eagerly, but will also be indexed on the value of the attribute @x. Another example: if a variable is initialized to an expression such as ($v, x) - that is, a sequence that appends an item to another variable - then we use a "shared append expression" which is a data structure that allows a sequence to be constructed by appending to an existing sequence without copying the entire sequence, which is a common pattern in algorithms using head-tail recursion.
Lazy evaluation (and inlining) need special care if the variable is declared outside a try/catch block, but is referenced within it. In such a case a dynamic error that occurs while evaluating the initialization expression must not be caught by the try/catch; it is logically outside its scope. (Writing this has made me realise that this is not yet implemented in Saxon; I have written a test case and it currently fails.)

If none of these special circumstances apply, lazy evaluation is chosen. There is one more choice to be made: between a Closure and a MemoClosure. The common case is a MemoClosure, and in this case, as the variable is incrementally evaluated, the value is saved for use when evaluating subsequent variable references. A (non-memo) closure is used when it is known that the value will only be needed once. Because most such cases have been handled by variable inlining, the main case where a non-memo closure is used is for the return value of a function. Functions, like variables, are lazily evaluated, so that the value returned to the caller is not actually a sequence in memory, but a closure containing all the information needed to materialize the sequence. (Like most rules in this story, there is an important exception: tail-call optimization, where the last thing a function does is to call itself, takes precedence over lazy evaluation).

So let's look more closely at the MemoClosure. A MemoClosure is a data structure that holds the following information:

The Expression itself (a pointer to a node in the expression tree). The Expression object also holds any information from the static context that is needed during evaluation, for example namespace bindings.
A copy of the dynamic context at the point where the variable is bound. This includes the context item, and values of any local variables referenced by the expression.
The current evaluation state: one of UNREAD (no access to the variable has yet been made), MAYBE_MORE (some items in the value of the variable are available, but there may be more to come), ALL_READ (the value of the variable is fully available), BUSY (the variable is being evaluated), or EMPTY (special case of ALL_READ in which the value is known to be an empty sequence).
An InputIterator: an iterator over the results of the expression, relevant when evaluation has started but has not finished
A reservoir: a list containing the items delivered by the InputIterator so far.

Many variable references, for example count($v), or index-of($v, 'z') result in the variable being evaluated in full. If this is the first reference to the variable, that is if the state is UNREAD, the logic is essentially

inputIterator = expression.iterate(savedContext);

for item in inputIterator {
  reservoir.add(item);
}
state = ALL_READ;
return new SequenceExtent(reservoir);

(However, Saxon doesn't optimize this case, and it occurs to me on writing this that it could.)

Other variable references, such as head($v), or $v[1], or subsequence($v, 1, 5), require only partial evaluation of the expression. In such cases Saxon creates and returns a ProgressiveIterator, and the requesting expression reads as many items from the ProgressiveIterator as it needs. Requests to get items from the ProgressiveIterator fetch items from the reservoir to the extent they are available; on exhaustion of the reservoir, they then attempt to fetch items from the InputIterator until either enough items are available, or the InputIterator is exhausted. Items delivered from the InputIterator are copied to the reservoir as they are found.

So far so good. This has all been in place for years, and works well. We have no evidence that it is in any way optimal, but it has been carefully tweaked over the years to deal with particular cases where it was performing badly. What has changed recently is that local variables can be referenced from multiple threads. There are two particular cases where this happens today: when xsl:result-document is used in Saxon-EE, it executes by default asynchronously in a new thread; and when the extension attribute saxon:threads is used on xsl:for-each, the items selected by the xsl:for-each are processed in parallel rather than sequentially.

The effect of this is that the MemoClosure object needs to be thread-safe: multiple requests to access the variable can come simultaneously from different threads. To achieve this a number of methods are synchronized. One of these is the next() method of the ProgressiveIterator: if two threads reference the variable at the same time, each gets its own ProgressiveIterator, and the next() method on one of these iterators is forced to wait until the other has finished.

This works, but it is risky. Brian Goetz in his excellent book Java Concurrency in Practice recommends that a method should not be synchronized unless (a) its execution time is short, and (b) as the author of the method, you know exactly what code will execute while it is active. In this case neither condition is satisfied. The next() method of ProgressiveIterator calls the next() method of the InputIterator, and this may perform expensive computation, for example retrieving and parsing a document using the doc() function. Further, we have no way of analyzing exactly what code is executed: in the worst case, it may include user-written code (for example, an extension function or a URIResolver). The mechanism can't deadlock with itself (because there cannot be a cycle of variable references) but it is practically impossible to prove that it can't deadlock with other subsystems that use synchronization, and in the face of maliciously-written used code, it's probably safe to assume that deadlock can occur. We haven't seen deadlock happen in practice, but it's unsatisfactory that we can't prove its impossibility.

So what should we do about it?

I think the answer is, add yet another exception to the list of cases where lazy evaluation is used: specifically, don't use it for a variable that can be referenced from a different thread. I'm pretty sure it's possible to detect such cases statically, and they won't be very common. In such cases, use eager evaluation instead.

We must be careful not to do this in the case of a loop-lifted variable, where the correct error semantics depend on lazy evaluation. So another tweak to the rules is, don't loop-lift code out of a multithreaded execution block.

This investigation also suggests a few other refinements we might make.

It seems worth optimizing for the case where the entire value of a variable is needed, since this case is so common. The problem is, it's not easy to detect this case: a calling expression such as count($v) will ask for an iterator over the variable value, without giving any indication that it intends to read the iterator to completion.
We need to reassess the rule that singleton local variables are evaluated eagerly.
We currently avoid using lazy evaluation for expressions with certain dependencies on the dynamic context (for example, position() and last()). But in the course of implementing higher-order functions, we have acquired the capability to hold such values in a saved copy of the dynamic context.
We could look at a complete redesign that takes advantage of higher-order functions and their closures. This might be much simpler than the current design; but it would discard the benefits of years of fine-tuning of the current design.
I'm not convinced that it makes sense for a MemoClosure to defer creation of the InputIterator until the first request for the variable value. It would be a lot simpler to call inputIterator = Expression.iterate(context) at the point of variable declaration; in most cases the implementation will defer evaluation to the extent that this makes sense, and this approach saves the cost of the elaborate code to save the necessary parts of the dynamic context. It's worth trying the other approach and making some performance measurements.