For the first time ever, I hit a deadlock in Saxon. It was with the current development version, fortunately, rather than in the field, though it's had me thinking about whether the code in 9.2 is entirely safe in this area.
In Saxon 9.2 there's only one place where Saxon uses multiple threads: namely in the "streaming copy" facility where one thread is splitting the incoming document into small chunks and the other thread is processing those chunks one by one. It's here that the deadlock occurred, though it's a bigger risk in future because Saxon 9.3 will allow evaluation of a for-each loop in parallel if users request it.
The deadlock occurred with evaluation of global variables. These are evaluated lazily, and a change I made when introducing multi-threaded for-each loops is that when Saxon decides it has to evaluate a global variable, it synchronizes on the Bindery which is where all the global variables are held. This avoids two threads evaluating the same global variable at the same time, which would not actually be fatal in most cases, though it could be expensive when evaluating a global variable is 90% of the entire transformation, as sometimes happens. The reason it synchronizes is that there's a flag for each global variable indicating whether it is currently being evaluated; this flag is used for detection of circularities where evaluating A causes evaluation of B which in turn causes evaluation of A. Such circularities cannot (always) be statically detected in XSLT because the dependency might be via an xsl:apply-templates call. Without the synchronisation (my reasoning probably was), a circularity could be wrongly reported because a two concurrent evaluations of the same variable could now happen not because of a circularity, but because of multithreaded evaluation.
But the problem is that this lock on the Bindery can be held for quite a long time, and in particular, it can be held by a thread that is waiting for input from another thread, thus leading to the deadlock that I found in testing.
So, what's the answer? Firstly, we can avoid using the "is-being-evaluated" flag in the Bindery for detection of circularities, and replace it with a stack of global variables currently-being-evaluated held in the XPathContext object; the circularity error then triggers if you try to evaluate a variable that is already on the stack. This eliminates the possibility of interference with/by the evaluation of global variables going on in other threads.
How do we then prevent two threads evaluating the same global variable? This is something we still want to avoid, for various reasons: as already discussed, evaluating some globals might take a seriously long time; some globals (such as select="doc('a.xml')") are complicated because they create entries in the shared document pool; and some invoke extension functions that might have side-effects. First thought is to have a Lock object per global variable, and a thread that decides to evaluate the global variable synchronizes on this Lock object. However this is not deadlock free in the event of circular dependencies. Consider the situation where A depends on B and B depends on A: one thread might attempt to evaluate A and grab A's lock, while the other starts with B and grabs B's lock; the two threads would deadlock against each other before either detected the circularity. If Java synchronization allowed one to detect and recover from deadlocks this would be an elegant way of detecting the circularity; but sadly, it doesn't.
Of course, there's a lot of support in Java for more sophisticated concurrency control than simple synchronization: various implementations of Lock, Semaphore, and the like. However, none of them seem to do quite what's needed. My current thought is that as well as each thread holding a stack of global-variables-under-evaluation, as discussed above, there should be a shared object holding the dependency graph among all global variables (accessed, of course, under synchronisation); before attempting to evaluate a global variable, the thread should not just check whether that variable is already on its stack, but rather whether evaluating it would create a cycle in the global dependency graph. Of course, that's equivalent to doing the deadlock detection "by hand".
I haven't made final decisions yet, so all ideas are welcome.
Is there a problem in Saxon 9.2? There won't be any deadlocks, because evaluation of variables isn't synchronized. However, I think there is a possible risk of false detection of circularities. The consequences of this aren't life-threatening, so I think we can probably wait and see whether it happens.
POSTSCRIPT: I've found the "stack of global variables currently being evaluated" already exists: when the context changes, a new context object is created, chained to the previous one, with a link to the instruction being evaluated when the context object is created; searching down this chain provides the information required to detect circular dependencies among variables.