At first sight, nothing seems more eminently streamable than the expression <xsl:value-of select="//section/head"/>. The sections are in document order, the headings are in document order, the order
         of the output is the same as the order of the input, so where is the problem? The
         fact is, however, as I hinted in my previous post, there is a lot more to this expression than meets the eye.
The problem is that if we adopt the nested-loop semantics for path expressions, interpreting this as "for each descendant::section for each child::head", the results are not guaranteed to be in document order. OK, they're in document order if the head element is the first thing in each section; but we don't know that. Let's imagine that some perverse document author has put the head element at the end. Then the XSLT processor has to cope with an input document like this:
<section nr="1">
  <section nr="1.1">
    <head>section 1.1</head>
  </section>
  <section nr="1.2">
                <head>section 1.2</head>
              </section>
  <head>section 1</head>
</section>
And while a path expression //section/head will output the headings in document order "section 1.1 section 1.2 section 1", a nested for-each
<xsl:for-each select="//section"><xsl:value-of select="head"/></xsl:for-each>
will output them as "section 1 section 1.1 section 1.2". Here the output is not in the same order as the input, therefore by definition the construct is not fully streamable - some kind of buffering is needed, either of the input or the output.
In the current internal draft of the XSLT 2.1 specification, we've therefore stated
         that such expressions are not streamable. This has the unfortunate consequence that
         //head isn't streamable either, because of course //head means /descendant-or-self::node()/head, which has the same structure, and has the feature that a pure nested loop evaluation
         delivers results that aren't in document order.
This is unfortunate because we want the rules to be intuitive. Something that looks
         streamable should actually be streamable. And the reason it looks streamable is that
         99% of the time it is: most of the time the elements selected by //employee aren't actually going to be recursively nested, and even if they are (as in the //section case), most of the time the headings are going to be at the start rather than at
         the end. So banning these constructs from streaming stylesheets is not a very friendly
         thing to do.
One solution might be to say that for path expressions, we abandon the nested-loop
         formulation of the semantics, and go instead with a "pattern-like" interpretation:
         to evaluate the expression .//section/head, you search all the descendants of the context node, and return those that match
         the pattern //section/head. That works unless the user actually does want a nested loop. And supporting streaming
         of the nested loop case (which also of course includes nested apply-templates calls)
         is probably just as important as supporting streaming of the path expression. It's
         also useful to be consistent between the two - users won't appreciate it if one is
         streamable and the other isn't, given that in 99% of cases the nested loop formulation
         actually returns results in document order.
In fact ever since the earliest days of streaming in Saxon, such constructs have been streamable. In the first incarnation of "streaming copy" you could write
<xsl:copy-of select="doc('book.xml')//section/head" saxon:read-once="yes"/>
and Saxon would sort it out - even if not only the sections, but their headings are
         nested (who said that a <head> element can't contain a <section> element?)
The way Saxon does this is to activate a watch when a node matching //section/head is found; and while this watch is active, two things are happening. Events from the
         parser are being directed to a tree builder that constructs a copy of the selected
         element. At the same time, the watch is still watching, so if another matching element
         is found, another tree builder is instantiated, and the relevant events will go to
         both. The only complication is that the trees then need to be returned in the right
         order: the first one started is the last to finish, but is also the first one that
         needs to be delivered in the output, which means that any trees that are started while
         another tree is already under construction need to be held back in a buffer, while
         the outermost tree can be streamed straight through to the serializer if that's where
         it's going. This turns out to work pretty well: in the 99% case, doing //employee does not find any nested employees, so the code to write an inner buffered employee
         never gets activated, and all the output is fully streamed. But if nested elements
         are found, the buffering swings into action, and the right result is still returned,
         but requires a bit more memory.
Originally written for xsl:copy-of, this same logic is now working for finding the string-value and typed-value of nested
         nodes, for doing aggregate functions such as sum(//section/length) (yes, even if the length element contains a section element...), and for nested apply-templates
         calls. Nested xsl:for-each isn't coded yet, but should fall out easily.
Now we'll have to see if we can get some of this thinking into the XSLT 2.1 draft. (If we can't it's not a great problem because we've always recognized that the spec will define a minimum set of things that can be streamed. But there's a lot of synergy to be gained by keeping the spec and what is in effect the reference implementation in sync with each other.)