The streaming facilities in Saxon-SA have been proving very popular with those users who have very large documents to process. However, at present the only thing you can do in streaming mode is to split the document into a flat sequence of subtrees, and then process each subtree independently. That meets many needs, but not all. There are many simple tasks that can intrinsically be streamed despite the fact that they don't fit this model: for example, renaming all the elements in a document, or deleting all the NOTE elements. So I've started implementing "streaming templates", where the document is processed hierarchically in classic XSLT style by applying template rules to each node at every level.
We've done a great deal of analysis specifying such a feature in the XSL Working Group, though it's not yet ready for publication; what I've implemented so far is a relatively small subset of what's possible, but the WG analysis has proved extremely useful (in turn, some of this comes from previous experience with STX). Streamability is a property of a mode, and for a document to be streamed, all the templates in that mode have to conform to strict rules. In the Saxon implementation the template must contain at most one instruction that processes the subtree rooted at the current node (typically <xsl:apply-templates/>, but it could also be <xsl:value-of select="."/> or <xsl:copy-of select="node()"/>). Other instructions must not navigate downwards or sideways, though they are allowed to access attributes of the current node and of its ancestors. The pattern used in the template rules is similarly restricted.
It was a bit of a mental challenge getting this working, because it required an inversion of the usual logic whereby each XSLT instruction reads the data it needs. Instead, each instruction needs to be activated ("pushed") when the data becomes available from the SAX parser. In fact "inversion" is precisely what is being done here, in the sense that Michael Jackson uses the term in Jackson Structured Programming (JSP - a popular programming methodology in the 1970s).
There are a lot of good ideas in JSP and I've been re-reading about it recently. It was developed as a way of structuring batch programs written to handle hierarchic data files stored on magnetic tape. A key notion is that the structure of the program is closely related to the structure of the input and output files. When the input and output have the same hierarchic structure, this is fairly trivial, and a pure single-pass streaming transformation is possible. However, JSP goes beyond this by categorizing the various kind of structural mismatches that can occur between input and output, many of which still allow the processing to be done in a single pass through the data. Often, however, this involves a kind of pipeline in which data is passed through a sequence of stages, each of which is itself purely sequential. (Starting to look familiar?) The problem is that only one of these stages (unless you use multiple threads) can own the main control loop and thus have a structure that is isomorphic with the data. Jackson devised the idea that each process can be written as if it owns the main loop, and can then be mechanically inverted so that it is driven by events rather than exercising control itself: you get the usability benefits of writing to a pull model, combined with the component reusability benefits that come from a push model. The more declarative the language, the easier it is to do the analysis and automate the program inversion.
In the 1980s, in ICL, we had a rather nice declarative 4GL language for writing both batch and TP systems, under the name of QuickBuild. You could write an interactive application in a conversational style -- send a message to the user and wait for a response -- and the compiler would invert this into an event-driven application suitable for running under a transaction processing monitor. The variables needed to retain state (or partial results) between user interactions were automatically identified by the compiler, greatly easing the task of writing transactional applications. The compiler here was performing a Jackson inversion on the application.
The generation of a bottom-up parser from a top-down BNF description of a language can also be seen as an inversion. In this case the state that needs to be retained between parser-generated events is represented explicitly using a finite-state automaton, and perhaps some counters for certain kinds of grammar.
Similarly the process of generating code for an XSLT stylesheet in such a way that it can be driven by events generated by the XML parser can be seen as an inversion.
Initially, in the Saxon implementation, I'm keeping things very simple. The streamed execution is considered as a depth-first traversal of the tree in which each template rule processes one node. The processing of a template is divided into three phases: an operation that happens before processing the children, an operation that recurses to process the children and descendants, and then an operation that happens after processing the subtree. The pre- and post- recursion operations have access to the element and its properties, to its attributes and the attributes of its ancestors, but not to the subtree rooted at the element, because that can only be visited once. The (inverted) compiled code for a template rule has two entry points corresponding to the pre- and post- recursive descent phases, and automatically maintains the modest amount of state information that needs to be retained between phases.
I've got enough of this working already to satisfy quite a few interesting use cases. I'd like to generalize it as much as possible, but there's definitely an 80:20 rule in play here - meeting 80% of the requirement is only 20% of the effort. Having got the basic design working for those templates that are streamable, the trickiest part that's left is to do the static analysis that enables templates that aren't streamable to be rejected with a meaningful explanation of why.