Arrow Expressions

By Michael Kay on November 19, 2020 at 10:20a.m.

Arrow Expressions

When I proposed the arrow operator to the XQuery/XSLT working groups, I thought of it as minor syntactic sugar. It's just a convenience: instead of substring-before(substring-after(X, '['), ']') you can write X => substring-after('[') => substring-before(']') which helps you to avoid going cross-eyed. If you're the kind of person who can play the piano with your hands crossed over, you probably don't need it, but for the rest of us, it makes life just a tiny bit easier.

So I was a bit surprised at XML Prague 2020 that Juri Leino managed to construct an entire presentation around the arrow operator (Shooting Arrows Fast and Accurately). Not only that, he also developed a whole library of functions, called XBow, to increase their power.

Now, XBow actually reveals a bit of a weakness in the construct: you can construct a pipeline of functions, but you can't include arbitrary expressions in the pipeline unless each of the expressions is made available via a function. Moreover, the value output by one step in the pipeline can only be used as the first argument in the next function: you can do X => concat('$') to add a "$" at the end of a string, but there's no simple way of adding a "$" at the front, except by defining a new prepend function that does this for you (or hoping that XBow will have anticipated your requirement).

Now, of course you can do X ! concat('$', .). But that doesn't always fit the bill. Firstly, it only works when you're processing single items (or mapping a sequence to multiple items). Secondly, (to use the current jargon) the optics are wrong: it breaks the pipeline visually.

So my first suggestion is that we allow inline expressions to appear in a pipeline. Something like this: X => {~ + 1}, or X => {concat('$', ~)}. I'm using '~' here as a variable to refer to the implicit argument, that is, the value passed down the pipeline. I would have used '_', as Scala does, but unfortunately '_' is a legal element name so it already has a meaning. And '~' seems to work quite nicely.

The next thing that's been requested is implicit mapping, so you can use something like arrow notation to do X ! substring-after(., '$') ! number(.) => sum(). (Actually, the main obstacle in getting the arrow operator accepted by the XQuery Working Group was that some people wanted it to have this meaning.)

For that I propose we use a "thin arrow": X -> substring-after('$') -> number() => sum(). The effect of the thin arrow is that instead of passing the value of the LHS to the function on the RHS en bloc, we pass it one item at a time. Of course, if the value on the LHS is a single item, then it doesn't matter which kind of arrow we use, both have the same effect.

If you're a fan of map-reduce terminology, then you'll recognize this instantly as a map-reduce pipeline. The -> operations are doing a mapping, and the final => does a reduce. If you're more into functional thinking, you probably think of it more in terms of function composition.

Of course thin arrows can also be used with arbitrary expressions, just like thick arrows: (0 to 3) -> {~ + 1} -> format-integer('a') => string-join('.') returns "a.b.c.d".

And now I'd like to pull one more rabbit out of the hat. What if I want a function that applies the above pipeline to any input sequence. I could write function($x){$x -> {~ + 1} -> format-integer('a') => string-join('.')} but that seems clunky. I'm looking for a nice way to supply functions as arguments to higher-order functions like sort, where other languages have shown that a concise notation for anonymous functions (like a -> a+1 in Javascript) can make code a lot simpler, less verbose, more readable.

So my proposal is this: just remove the left-hand expression, so you have something starting with -> or =>, and use this as an anonymous arity-1 function.

So you can now do: //employee => sort((), ->{~/@salary}) to sort employees by salary, or //employee => sort((), ->{~/@salary}->substring-after('$')->number()) if you need to do a bit more processing.

As another little refinement, in the case of ->, the implicit argument is always a single item, so we can bind it to the context item. So ->{~/@salary} can be simplified to ->{@salary}. Basically, within curly braces on the RHS of ->, . and ~ mean the same thing.

I believe that all these constructs can be added to the grammar without introducing ambiguity or backwards incompatibility, but I haven't proved it conclusively yet.

Postscript

The ~ construct seems to be the missing ingredient to enabling pipelines in XSLT. Consider:

<xsl:pipeline>
  <xsl:apply-templates select="/" mode="m1"/>
  <xsl:apply-templates select="~" mode="m2"/>
  <xsl:for-each select="~">
    <e><xsl:copy-of select="."/><e>
  </xsl:for-each>
</xsl:pipeline>             
            

Here "~" is acting as an implicit variable to pass the result of one instruction to be the input for the next: basically eliminating the clunky xsl:variable declarations needed to do this today. The instructions that form the children of the xsl:pipeline element are effectively connected to each other with an implicit => operator.