Continuing the investigation of Tatu Saloranta's XSLTMark measurements, I was puzzled by the fact that on certiain tests, he was seeing a very different ratio between Saxon and XSLTC performance from the figures I was seeing. I got him to run my test driver just to confirm the numbers were real, and they were. So there had to be some difference caused by the hardware, the JVM, the operating system, or some other external factor.
I thought one theory worth testing was that XSLTC might be making better use of multiple processors than Saxon (which is essentially single-threaded). So I rigged up a test to measure the performance of one of the transformations (identity.xsl transforming db10000.xml) in a number of threads in parallel. Here are the initial measurements: A is the number of threads, B is the "effective elapsed time" for Saxon in milliseconds (actually the time taken to run the transformation 100 times, divided by 100), C is the same for XSLTC, and we'll come back to D later:
A B C D
1 502 382 492
2 280 200 280
5 466 175 247
10 485 164 242
20 501 162 241
So we see from columns B and C that both Saxon and XSLTC throughput increases by about 1.8 to 1.9 times when we move from a single thread to two threads, reflecting the ability to use both cores of the dual-core processor. But as the number of threads increases, XSLTC performance gets better while Saxon gets worse.
The obvious culprit is the shared Namepool. I haven't found a good way of measuring contention on the NamePool in Java, and I spent a few hours trying (again) to see if I could find any decent tools, and in the end decided to debug it "by hand". Sure enough, the xsl:copy instruction in the stylesheet is generating a call on the synchronized NamePool method allocateNamespaceCode. Once discovered, it wasn't difficult to elimate this call, and the improved results after doing so are shown in column D.
This still leaves two questions unanswered: why is XSLTC performing 35% faster than Saxon on this test, and why is Tatu seeing a different ratio on his Linux machine? I suspect the answer to the first question is that an identity transform is dominated by the performance of the XML parser, and XSLTC has some slick integration with Xerces in this area. (One of the reasons for working with Tatu is to explore whether there are opportunities to do something similar with his new Aalto parser.) The second one is unanswered for now - although one might speculate that most products will tend to work best on the platform where they are developed, because of unintentional bias by the developers.
Meanwhile we've fixed a nasty little performance glitch, and we've learnt something about the impact that contention on the NamePool can have. I think there are probably a lot more opportunities to reduce contention here while still gaining the benefits that the design brings.