If you saw this pattern in an XSLT stylesheet, I can guess your reaction: I haven't seen a pattern like that before. Cool, a neat way of matching paragraphs that aren't in an appendix. Must remember that and use it myself.
Sadly, it doesn't do what you think. Consider this input document:
<appendix id="A">
<section id="A.1">
<para>Ipsum lorem.</para>
</section>
</appendix>
You'd probably be as surprised as I was to see that the Ipsum lorem paragraph in this
example matches the pattern para except appendix//para
.
To see why this is true, go to the spec, section 5.5.3:
An item N matches a pattern P if the following applies, where EE is the equivalent expression to P:
N is a node, and the result of evaluating the expression root(.)//(EE)
with a singleton
focus based on N is a sequence that includes the node N.
So, this is saying that a node matches the pattern if it is selected by the expression root(.)//(para except appendix/para)
.
Assuming that we're in a tree rooted at a document node, that means it must be selected by the expression
/descendant-or-self::node()/(para except appendix//para)
.
Now, in our example document, one of the nodes selected by /descendant-or-self::node()
is the section
element;
and when we evaluate (para except appendix//para)
starting at the section
element, the first operand
(para
) selects our paragraph, and the second operand (appendix//para
) doesn't select it, so
the expression as a whole selects it, and therefore it matches the pattern.
That's totally counter-intuitive, and it's certainly not what the Working Group intended. It's a nasty bug. So the question is, what can we do about it, given that this is a published spec and there are implementations out there, and user applications that depend on it?
Is there anything we can do about it?
Perhaps we should start by asking: what would we like the spec to say, if we had the opportunity to change it?
Given that we already have a special rule for patterns with a top-level union
operator (see §6.5 rule 2),
we could add a special rule for patterns with a top-level intersect
or except
operator: a pattern of the form A except B
matches an item if pattern A matches the item and
pattern B does not. (And analagously for intersect
.)
If that's what we think we need to do, that leaves two challenges:
- Changing the spec (given there is no longer a Working Group to maintain it).
- Changing the Saxon implementation.
Starting with the second point, there are several possibilities:
- Just do it, and hope we don't break any existing applications.
- Support both the old and new semantics concurrently, with some mechanism for selecting which to use. (Which should be the default? We want new users not to fall into the elephant trap, but we also don't want to break working applications.)
- Deprecate the syntax, and provide new syntax for the new semantics (e.g. operators spelled
and-also
orbut-not
). Note however, that it's likely most applications currently usingexcept
in a pattern are using unproblematic patterns like@* except @code
.
The third option seems the most satisfactory. And that suggest a route forward for the spec: in XSLT 4.0, if and when we
manage to get it defined, deprecate the except
and intersect
operators at the top level of a pattern,
and replace them with new operators that have the expected intuitive semantics.