Java Generics revisited

In Saxon 9.9 we took considerable pains to adopt Java Generics for processing sequences: in particular the Sequence and SequenceIterator classes, and all their subclasses, became Sequence<? extends Item> and SequenceIterator<? extends Item>.

I'm now coming to the conclusion that this was a mistake; or at any rate, that we went too far.

What exactly are the benefits of using Generics? It's supposed to improve type safety and reduce the need for casts which, if applied incorrectly, can trigger run-time exceptions. So it's all about detecting more of your errors at compile time.

Well, I don't think we've been seeing those benefits. And the main reason for that is that in most cases, when we're processing sequences, we don't have any static knowledge of the kind of items we are dealing with.

Sure, when we process a particular XPath path expression, we know whether it's going to deliver nodes or atomic values. But when we write the Java code in Saxon to handle path expressions, all we know is that the result will always be a sequence of items.

There are some cases where particular kinds of expression only handle nodes, or only handle atomic values. For example, the input sequences for a union operator will always be sequences of nodes. It would be nice if we didn't have to handle a completely general sequence and cast every item to class NodeInfo. But it's an illusion to think we can get extra type safety that way. The operands of a union are arbitrary expressions, and the iterators returned by the subexpressions are going to be arbitrary iterators; there's no way we can translate the type-safety we are implementing at the XPath level into type-safe evaluators at the Java level.

It's particularly obvious that generics give us no type-safety at the API level. In s9api, XPathSelector.evaluate() returns an XdmValue. That's a lot better than the JAXP equivalent which just returns Object, but the programmer still has to do casting to convert the items in the return XdmValue to nodes, string, integers, or whatever. And there's no way we can change that; the XPath expression is supplied as a string at run-time, so it's only at run-time that we know what type of items it returns. If that's true at the API level, it's equally true internally. Any kind of expression can invoke any other kind of expression (that's what orthogonality in language design is about), which means that the interfaces between an expression and its subexpressions are always going to be general-purpose sequences whose item type is known only at execution time.

There are a couple of aspects of Java generics that cause us real pain.

The first is the XDM rule that every item is itself a Sequence. So if Sequence is a generic type, parameterized by Item type, and Item is a subclass of Sequence, then Item has to be itself a generic type parameterized by its own type. Rather than Item, it has to be Item<? extends Item>; or perhaps it should be Item<? extends Item<? extends Item>>, and so ad infinitum. And then StringValue extends Item<StringValue> and so on. We found ways around that conundrum, but the complexity is horrendous; it certainly doesn't achieve the goal of making it easier to write correct code.
The second is arrays. Arrays don't play at all well with generics; you can't create an array of a generic type, for example. And yet there are lots of places where it's useful to use arrays, and some where arrays are the only option. VarArgs functions, for example, present their arguments as an array. In some cases we wanted to carry on using arrays (rather than lists) for compatibility, in other cases we want to use them for convenience or for performance. The natural signature for a function call, for example is public Sequence call(Context context, Sequence[] args). There's no way we can refine this in a way that passes static information about the argument types from the caller to the callee, because we're using the same Java signature for all XPath functions.

But having got Generics working, at great effort, in 9.9, should we retain them or drop them?

One reason I'm motivated to drop them is .NET. We have a significant user base on .NET, but we have something of a potential crisis looming in terms of ongoing support for this platform. Microsoft appear to be basing their future strategy around .NET Core, allowing .NET Framework to fade away into the sunset. But the technology we use for bridging to .NET, namely IKVM, only supports .NET Framework and not .NET Core; and Jeroen Frijters who single-handedly developed IKVM and supported it for umpteen years (with no revenue stream to support it) has thrown in the towel and is no longer taking it forward. So we're looking at a number of options for a way forward on .NET. One of these is source code conversion; and to make source code conversion viable without forking the code, we need to minimise our dependencies on Java features that don't translate easily to C#. Notable among those features is generics.

In the short term, I think I'm going to roll back the use of generics in selected areas where they are clearly more trouble than they are worth. That's particularly true of Sequence and its subclasses, including Item. For SequenceIterator it's probably worth keeping generics for the time being, but we'll keep that under review.