Arrow Expressions
When I proposed the arrow operator to the XQuery/XSLT working groups, I thought of it as minor syntactic sugar.
It's just a convenience: instead of substring-before(substring-after(X, '['), ']') you can write
X => substring-after('[') => substring-before(']') which helps you to avoid going cross-eyed.
If you're the kind of person who can play the piano with your hands crossed over, you probably don't need it,
but for the rest of us, it makes life just a tiny bit easier.
So I was a bit surprised at XML Prague 2020 that Juri Leino managed to construct an entire presentation around the arrow operator (Shooting Arrows Fast and Accurately). Not only that, he also developed a whole library of functions, called XBow, to increase their power.
Now, XBow actually reveals a bit of a weakness in the construct: you can construct a pipeline of functions,
but you can't include arbitrary expressions in the pipeline unless each of the expressions is made available
via a function. Moreover,
the value output by one step in the pipeline can only be used as the first argument in the next function: you
can do X => concat('$') to add a "$" at the end of a string, but there's no simple way of adding
a "$" at the front, except by defining a new prepend function that does this for you (or hoping
that XBow will have anticipated your requirement).
Now, of course you can do X ! concat('$', .). But that doesn't always fit the bill. Firstly,
it only works when you're processing single items (or mapping a sequence to multiple items). Secondly,
(to use the current jargon) the optics are wrong: it breaks the pipeline visually.
So my first suggestion is that we allow inline expressions to appear in a pipeline. Something like this:
X => {~ + 1}, or X => {concat('$', ~)}.
I'm using '~' here as a variable to refer to the
implicit argument, that is, the value passed down the pipeline. I would have used '_', as Scala does, but unfortunately
'_' is a legal element name so it already has a meaning. And '~' seems to work quite nicely.
The next thing that's been requested is implicit mapping, so you can use something like arrow notation
to do X ! substring-after(., '$') ! number(.) => sum(). (Actually, the main obstacle in getting
the arrow operator accepted by the XQuery Working Group was that some people wanted it to have this meaning.)
For that I propose we use a "thin arrow": X -> substring-after('$') -> number() => sum().
The effect of the thin arrow is that instead of passing the value of the LHS to the function on the RHS
en bloc, we pass it one item at a time. Of course, if the value on the LHS is a single item, then
it doesn't matter which kind of arrow we use, both have the same effect.
If you're a fan of map-reduce terminology, then you'll recognize this instantly as a map-reduce
pipeline. The -> operations are doing a mapping, and the final => does a reduce.
If you're more into functional thinking, you probably think of it more in terms of function composition.
Of course thin arrows can also be used with arbitrary expressions, just like thick arrows:
(0 to 3) -> {~ + 1} -> format-integer('a') => string-join('.') returns
"a.b.c.d".
And now I'd like to pull one more rabbit out of the hat. What if I want a function that applies the
above pipeline to any input sequence. I could write function($x){$x -> {~ + 1} ->
format-integer('a') => string-join('.')} but that seems clunky. I'm looking for a nice way
to supply functions as arguments to higher-order functions like sort, where other languages have
shown that a concise notation for anonymous functions (like a -> a+1 in Javascript) can
make code a lot simpler, less verbose, more readable.
So my proposal is this: just remove the left-hand expression, so you have something starting with
-> or =>, and use this as an anonymous arity-1 function.
So you can now do: //employee => sort((), ->{~/@salary}) to sort employees
by salary, or //employee => sort((), ->{~/@salary}->substring-after('$')->number())
if you need to do a bit more processing.
As another little refinement, in the case of ->, the implicit argument is
always a single item, so we can bind it to the context item. So ->{~/@salary}
can be simplified to ->{@salary}. Basically, within curly braces on the RHS of ->,
. and ~ mean the same thing.
I believe that all these constructs can be added to the grammar without introducing ambiguity or backwards incompatibility, but I haven't proved it conclusively yet.
Postscript
The ~ construct seems to be the missing ingredient to enabling pipelines in XSLT.
Consider:
<xsl:pipeline>
<xsl:apply-templates select="/" mode="m1"/>
<xsl:apply-templates select="~" mode="m2"/>
<xsl:for-each select="~">
<e><xsl:copy-of select="."/><e>
</xsl:for-each>
</xsl:pipeline>
Here "~" is acting as an implicit variable to pass the result of one instruction to be the input for
the next: basically eliminating the clunky xsl:variable declarations needed to do this today.
The instructions that form the children of the xsl:pipeline element are effectively
connected to each other with an implicit => operator.