Arrow Expressions
When I proposed the arrow operator to the XQuery/XSLT working groups, I thought of it as minor syntactic sugar.
It's just a convenience: instead of substring-before(substring-after(X, '['), ']')
you can write
X => substring-after('[') => substring-before(']')
which helps you to avoid going cross-eyed.
If you're the kind of person who can play the piano with your hands crossed over, you probably don't need it,
but for the rest of us, it makes life just a tiny bit easier.
So I was a bit surprised at XML Prague 2020 that Juri Leino managed to construct an entire presentation around the arrow operator (Shooting Arrows Fast and Accurately). Not only that, he also developed a whole library of functions, called XBow, to increase their power.
Now, XBow actually reveals a bit of a weakness in the construct: you can construct a pipeline of functions,
but you can't include arbitrary expressions in the pipeline unless each of the expressions is made available
via a function. Moreover,
the value output by one step in the pipeline can only be used as the first argument in the next function: you
can do X => concat('$')
to add a "$" at the end of a string, but there's no simple way of adding
a "$" at the front, except by defining a new prepend
function that does this for you (or hoping
that XBow will have anticipated your requirement).
Now, of course you can do X ! concat('$', .)
. But that doesn't always fit the bill. Firstly,
it only works when you're processing single items (or mapping a sequence to multiple items). Secondly,
(to use the current jargon) the optics are wrong: it breaks the pipeline visually.
So my first suggestion is that we allow inline expressions to appear in a pipeline. Something like this:
X => {~ + 1}
, or X => {concat('$', ~)}
.
I'm using '~' here as a variable to refer to the
implicit argument, that is, the value passed down the pipeline. I would have used '_', as Scala does, but unfortunately
'_' is a legal element name so it already has a meaning. And '~' seems to work quite nicely.
The next thing that's been requested is implicit mapping, so you can use something like arrow notation
to do X ! substring-after(., '$') ! number(.) => sum()
. (Actually, the main obstacle in getting
the arrow operator accepted by the XQuery Working Group was that some people wanted it to have this meaning.)
For that I propose we use a "thin arrow": X -> substring-after('$') -> number() => sum()
.
The effect of the thin arrow is that instead of passing the value of the LHS to the function on the RHS
en bloc, we pass it one item at a time. Of course, if the value on the LHS is a single item, then
it doesn't matter which kind of arrow we use, both have the same effect.
If you're a fan of map-reduce terminology, then you'll recognize this instantly as a map-reduce
pipeline. The ->
operations are doing a mapping, and the final =>
does a reduce.
If you're more into functional thinking, you probably think of it more in terms of function composition.
Of course thin arrows can also be used with arbitrary expressions, just like thick arrows:
(0 to 3) -> {~ + 1} -> format-integer('a') => string-join('.')
returns
"a.b.c.d"
.
And now I'd like to pull one more rabbit out of the hat. What if I want a function that applies the
above pipeline to any input sequence. I could write function($x){$x -> {~ + 1} ->
format-integer('a') => string-join('.')}
but that seems clunky. I'm looking for a nice way
to supply functions as arguments to higher-order functions like sort, where other languages have
shown that a concise notation for anonymous functions (like a -> a+1
in Javascript) can
make code a lot simpler, less verbose, more readable.
So my proposal is this: just remove the left-hand expression, so you have something starting with
->
or =>
, and use this as an anonymous arity-1 function.
So you can now do: //employee => sort((), ->{~/@salary})
to sort employees
by salary, or //employee => sort((), ->{~/@salary}->substring-after('$')->number())
if you need to do a bit more processing.
As another little refinement, in the case of ->
, the implicit argument is
always a single item, so we can bind it to the context item. So ->{~/@salary}
can be simplified to ->{@salary}
. Basically, within curly braces on the RHS of ->
,
.
and ~
mean the same thing.
I believe that all these constructs can be added to the grammar without introducing ambiguity or backwards incompatibility, but I haven't proved it conclusively yet.
Postscript
The ~
construct seems to be the missing ingredient to enabling pipelines in XSLT.
Consider:
<xsl:pipeline> <xsl:apply-templates select="/" mode="m1"/> <xsl:apply-templates select="~" mode="m2"/> <xsl:for-each select="~"> <e><xsl:copy-of select="."/><e> </xsl:for-each> </xsl:pipeline>
Here "~" is acting as an implicit variable to pass the result of one instruction to be the input for
the next: basically eliminating the clunky xsl:variable
declarations needed to do this today.
The instructions that form the children of the xsl:pipeline
element are effectively
connected to each other with an implicit =>
operator.