<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet href="/xslt/atom.xsl" type="text/xsl"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xml:lang="EN-us"><title>O’Neil Delpratt’s Blog</title><subtitle>Saxon, XSLT, XQuery and XML related</subtitle><link href="https://blog.saxonica.com/oneil/" rel="alternate" type="text/html"/><link href="https://blog.saxonica.com/atom/oneil.xml" rel="self"/><id>https://blog.saxonica.com/oneil/atom.xml</id><updated>2026-02-04T17:02:41.240361847Z</updated><author><name>O’Neil Delpratt</name></author><entry><title>Saxonica takes SaxonC to the GraalVM community summit</title><link href="https://blog.saxonica.com/oneil/2025/11/14-graalvm-community-summit.html" rel="alternate" type="text/html"/><id>https://blog.saxonica.com/oneil/2025/11/14-graalvm-community-summit.html</id><published>2025-11-14T17:30:00Z</published><content type="xhtml" xml:base="https://blog.saxonica.com/oneil/2025/11/14-graalvm-community-summit.html"><div xmlns="http://www.w3.org/1999/xhtml">

<br/>
<img src="graalvm_summit25.jpg" type="image/jpg"/>

<p>Saxonica is a small company, but we’re doing innovative work all the same. In October,
            Matt Patterson and I represented Saxonica at the <a href="https://www.graalvm.org/community/summit/">GraalVM Community Summit</a> at Oracle’s
            offices in Zurich, because of the work we’re doing using <a href="https://www.graalvm.org/">GraalVM’s Native Image</a>
            technology in <a href="https://www.saxonica.com/saxon-c/index.xml">SaxonC</a>.</p>
        <p>It was an excellent chance to put faces to names, ask and answer questions, get to show
            what we’re doing, and ask the engineers behind the tools detailed questions. The GraalVM
            team were welcoming and more than willing to dive in.</p>
        <p>There were all sorts of people with an interest in contributing to and using GraalVM,
            from the biggest players in the industry to small companies like ours, and it was good
            to see the sheer enthusiasm and investment that is being contributed into GraalVM and in
            their projects.</p>
        <p>This year’s summit came at an especially interesting time, following the <a href="https://blogs.oracle.com/java/detaching-graalvm-from-the-java-ecosystem-train">announcement</a>
            that GraalVM will be decoupled from the Oracle Java release schedule. That change has
            sparked fresh discussions about the future direction of the JDK. It was reassuring to
            hear from the core team that they remain deeply committed; not only to GraalVM’s
            continued development but also to its non-Java languages, including GraalPy and GraalJS,
            which are rapidly maturing.</p>
        <p>During the unconference sessions, we had the opportunity to share how Saxonica is using
            GraalVM to build SaxonC as a shared library. We discussed our experiences with the
            performance costs of crossing the C/C++ boundary with Java and limits we apply for
            improvements, and our efforts to enable calling C, C++, PHP, and Python native functions
            from Java as the host language, a long-requested feature we’re aiming to deliver user
            defined extension function for XSLT, XQuery and in XPath. We are in the home stretch now
            for releasing this feature in SaxonC 13.</p>
        <p>One of the most valuable parts of the summit was being in the same room as the GraalVM
            developers themselves. We compared development approaches for the C Native API vs. JNI;
            and learned that JNI has evolved considerably in recent years. Even so, we’ve decided to
            stick with the C Native API to keep things low level and for the use of function pointer
            indirection for now, as it best fits our architecture and goals.</p>
        <p>The unconference sessions also featured a range of fascinating talks, from BellSoft’s
            work on parallel processing in Liberica, to developments in GraalJS/Wasm and Spring. The
            Crema project, which tackles dynamic class loading in native-image was impressive and
            may benefit SaxonC in the future. And the Shenandoah garbage collector from Amazon
            promises to deliver exciting improvements for high-performance Java applications.</p>
        <p>For a “small” company like Saxonica, being part of these conversations and contributing
            to the future of the JVM and polyglot ecosystem feels immensely rewarding. We left
            Zurich energised, inspired, and ready to keep pushing the boundaries of what’s possible
            with GraalVM.</p>
</div></content></entry><entry><title>Saxon/C - Saxon for the C/C++ and PHP platforms</title><link href="https://blog.saxonica.com/oneil/2013/12/saxonc---saxon-for-the-cc-and-php-platforms.html" rel="alternate" type="text/html"/><id>https://blog.saxonica.com/oneil/2013/12/saxonc---saxon-for-the-cc-and-php-platforms.html</id><published>2013-12-02T18:11:25Z</published><content type="xhtml" xml:base="https://blog.saxonica.com/oneil/2013/12/saxonc---saxon-for-the-cc-and-php-platforms.html"><div xmlns="http://www.w3.org/1999/xhtml">
      
      <p>At the XML Summer School 2013, Tony Graham presented a lightning talk about life after
         libxslt 1.0.  I was not present for this summer school, but it was clear from the
         feedback of the discussions I received that there is a major gap of XSLT 2.0 support
         in the large developer community of C/Perl/PHP/Python/Ruby world and associated tools
         that rely on libxslt.<br/>
         It is a known problem, which has never, to my knowledge been addressed. At Saxonica,
         we wanted to try and plug this gap by porting the Saxon processor from Java to C/C++,
         which would enable us to communicate with the languages specified above. One of our
         goals, if possible was to interface with libxml and libxslt. Providing such a bridge
         or cross-compiled version of a full fledged Java application
         to C/C++ is always a daunting task. In this blog post I discuss the technical steps
         in our quest to achieve our goals
         and give some details of the experiences gained along the way. I will begin by detailing
         the various technologies that we tried, and how we have have ended up using a commercial
         Java native compiler after several failed attempts with tools that either did not
         work, cumbersome or were just too error prone.
         </p>
      <p>
         <b>LLVM</b>
         </p>
      <p>
         At the summer school there were discussions that the tool <a href="http://llvm.org/">LLVM</a> could do the job of compiling Java to native code. As claimed on the project website
         LLVM is a collection of modular and reusable compiler and toolchain technologies.
         The LLVM project seems very active with many projects using it to do various task,
         but I found it difficult to get anything working. In particular, I tried using the
         VMKit which relies on LLVM to compile some a simple 'Hello World' examples to machine
         code, but even that seemed cumbersome.
         </p>
      <p>
         <b>GCJ</b>
         </p>
      <p>
         Secondly, I looked at the <a href="http://gcc.gnu.org/java/">GCJ</a> technology. GCJ is a tool that I have used before, so I was confident that it would
         work. However, from my past experience using this tool is that it can be error prone
         and contains long standing bugs, which is a result of the project being dormant for
         several years, it seems unlikely that bugs will be fixed. The other worrying fact
         is that GCJ only supports up-to JDK 1.5. Nevertheless for lack of other options, I persevered
         with GCJ and I had much better success given that I managed to compile Saxon-HE to
         native machine
         code  and actually got it to execute my example stylesheets. I had some problems because
         of classes that were not present in the GCJ implementation of JDK 1.5, such as the
         packages java.math and javax.xml. Therefore, I had to include my own version of these
         packages.
         </p>
      <p>
         The next step was to create a shared library of Saxon-HE, so that I could interface
         it with C/C++. This proved to be a real battle, which in the end I succeeded. I decided
         to use Compiled Native Interface (CNI), which presents a convenient way to write Java
         native methods using C++. The alternative was JNI (Java Native Interface), which may
         be viewed as more portable. Both interfaces though have similar principles: you need
         a Java/CNI-aware C++ compiler, any recent version of G++ is capable, and then you
         must include the header file for each Java class it uses. These header files, if not
         automatically generated, can be done using gcjh. I soon gave up on using GCJ: I stumbled
         upon a few known bugs and because if I was having major issues with the setup and
         prerequisites required then surely users would have the same problems.
         </p>
      <p>
         <b>Excelsior JET</b>
         </p>
      <p>
         The <a href="http://www.excelsior-usa.com/">Excelsior JET tool</a> is the final technology we looked at and thankfully it is what we have ended up using
         in the alpha release. JET is a commercial product that provides a Java native compiler
         for both Linux and Windows platforms. What is good about this software tool is that
         it provides an easy to use Graphical interface to build native executables and shared
         libraries from jar file(s). It also has the feature to package up the software into
         an installer ready to be deployed onto its intended host machine. This was great for
         us! 
         </p>
      <p>
         There is a lot I could write about JET, but it would be a repeat of the plethora of
         information currently available on their website and forum. However, just to mention
         we started with their evaluation version which offers 90-days free usage of their
         software
         before purchasing the professional edition. Another point of interest is that Excelsior
         offer a free-of-charge license for use in conjunction with open-source software.
         </p>
      <p>
         We know that there will be some sections of the open-source community that dislike
         the dependency upon using a commercial tool, but it is not that dissimilar from the
         early years of Java when the Sun compiler was freely available but not open-sourced.
          
         </p>
      <p>
         <b>Implementation notes using JET</b>
         </p>
      <p>
         After creating the shared library, to interface it with C/C++ I used JNI. It is possible
         to use JET's own Java interface to external functions called xFunction, which is recommended
         if starting from scratch, but having used JNI with GCJ I continued with that approach.
         To get started there are a few examples of invoking a library with C/C++. In essence,
         you need to load the library and initialize the JET run-time before you can use it,
         see the code below (from the file xsltProcessor.cc):
         </p>
      <pre><code>/* Load dll. */
HANDLE loadDll(char* name)
{
  HANDLE hDll = LoadLibrary (name);

  if (!hDll) {
    printf ("Unable to load %s\n", name);
    exit(1);
  }

  printf ("%s loaded\n", name);
  return hDll;
}

extern "C" {jint (JNICALL * JNI_GetDefaultJavaVMInitArgs_func) (void *args);
            jint (JNICALL * JNI_CreateJavaVM_func) (JavaVM **pvm, void **penv, void *args);
}

/*Initialize JET run-time.*/
extern "C" void initJavaRT(HANDLE myDllHandle, JavaVM** pjvm, JNIEnv** penv)
{
  int result;
  JavaVMInitArgs args;

  JNI_GetDefaultJavaVMInitArgs_func =
  (jint (JNICALL *) (void *args))
  GetProcAddress (myDllHandle, "JNI_GetDefaultJavaVMInitArgs");
  JNI_CreateJavaVM_func =
  (jint (JNICALL *) (JavaVM **pvm, void **penv, void *args))
  GetProcAddress (myDllHandle, "JNI_CreateJavaVM");

  if(!JNI_GetDefaultJavaVMInitArgs_func) {
    printf ("%s doesn't contain public JNI_GetDefaultJavaVMInitArgs\n", dllname);
    exit (1);
  }

  if(!JNI_CreateJavaVM_func) {
    printf ("%s doesn't contain public JNI_CreateJavaVM\n", dllname);
    exit (1);
  }

  memset (&amp;args, 0, sizeof(args));
  args.version = JNI_VERSION_1_2;
  result = JNI_GetDefaultJavaVMInitArgs_func(&amp;args);
  if (result != JNI_OK) {
    printf ("JNI_GetDefaultJavaVMInitArgs() failed with result %d\n", result);
    exit(1);
  }

  /* NOTE: no JVM is actually created
  * this call to JNI_CreateJavaVM is intended for JET RT initialization
  */
  result = JNI_CreateJavaVM_func (pjvm, (void **)penv, &amp;args);
  if (result != JNI_OK) {
    printf ("JNI_CreateJavaVM() failed with result %d\n", result);
    exit(1);
  }
  printf ("JET RT initialized\n");
  fflush (stdout);
}

XsltProcessor::XsltProcessor(bool license) {
  /* * First of all, load required component.
  * By the time of JET initialization, all components should be loaded.
  */
  myDllHandle = loadDll (dllname);

  /*
  * Initialize JET run-time.
  * The handle of loaded component is used to retrieve Invocation API.
  */
  initJavaRT (myDllHandle, &amp;jvm, &amp;env);

  /* Look for class.*/
  cppClass = lookForClass(env, "net/sf/saxon/option/cpp/XsltProcessorForCpp");
  versionClass = lookForClass(env, "net/sf/saxon/Version");

  cpp = createObject (env, cppClass, "(Z)V", license);
  jmethodID debugMID = env-&gt;GetStaticMethodID(cppClass, "setDebugMode", "(Z)V");
  if(debugMID){
    env-&gt;CallStaticVoidMethod(cppClass, debugMID, (jboolean)false);
  }
  ....
}
...</code></pre>
      <p>In the constructor method of XsltProcessor we see that once we have loaded the library
         and initialized the JET run-time we can now make calls to the environment, which has
         been created to get class definitions and create instance(s) of the class in the Java
         world. This is before we make method calls on the object.
         </p>
      <p>
         <b>PHP Extension in C/C++</b>
         </p>
      <p>
         After successfully getting XSLT transformations to work within C/C++, the next step
         was to try and develop a PHP extension, which would operate like libxslt. There is
         a lot of material on the web and books in regards to PHP extensions and I found the
         following guide very useful: <a href="http://devzone.zend.com/1435/wrapping-c-classes-in-a-php-extension/">http://devzone.zend.com/1435/wrapping-c-classes-in-a-php-extension/</a>. I literally followed it step-by-step, adding a few steps of my own when I worked
         out what I was doing.
         </p>
      <p>
         <b>Testing</b>
         </p>
      <p>
         As a proof of concept I wrote a test harness in PHP which makes use of the PHP extension
         (see: xslt30TestSuite.php in the download library). This is a test driver designed
         to run the public W3C XSLT test suite at <a href="https://dvcs.w3.org/hg/xslt30-test/">https://dvcs.w3.org/hg/xslt30-test/</a>. The test driver in its current form requires Saxon-EE, which is not yet available
         in this alpha release; nevertheless, the program may serve as a useful example of
         how the API can be used. Note that it is written to use libXML to read the test catalog,
         but to use Saxon for running the tests and assessing the results.
         </p>
      <p>
         <b>Performance Testing</b>
         </p>
      <p>
         I now draw comparisons between running Saxon-HE (on Java) vs running Saxon-HE/C on
         C++ and on PHP on some preliminary tests. I also compare these times to libxslt (C/C++).
         An important aim is to get a good measure of the costs of crossing the Java/C++ boundary
         using JNI and also to see what the effect is with the PHP extension. 
         </p>
      <p>
         I used Saxon-HE 9.5.1.3 as the baseline. The test machine was a Intel Core i5 processor
         430M laptop with 4GB memory, 2.26Ghz CPU and 3MB L3 cache, running Ubuntu 13.10 Linux.
         Servers Apache2 and PHP version 5.5.3-1ubuntu2. The compiler was Sun/Oracle Java 1.6.0.43. 
         </p>
      <p>
         The experiments were based on the XMark benchmark. I used query q8, which was converted
         into the stylesheet below. The choice of q8.xsl is because we should expect some performance
         bottle-necks across the implementations due to its equijoins in the query:
         </p>
      <pre>&lt;result xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xsl:version="2.0"&gt;
&lt;!-- Q8.  List the names of persons and the number of items they bought.
          (joins person, closed_auction) --&gt;

  &lt;xsl:for-each select="/site/people/person"&gt;
    &lt;xsl:variable name="a"
       select="/site/closed_auctions/closed_auction[buyer/@person = current()/@id]"/&gt;
    &lt;item person="{name}"&gt;&lt;xsl:value-of select="count($a)"/&gt;&lt;/item&gt;
  &lt;/xsl:for-each&gt;

&lt;/result&gt;</pre>
      <p>
         The running times of executing q8.xsl on the document xmark64.xml, which is a 64MB
         size document are as follows:
         </p>
      <p>
         Saxon-HE (Java):    60.5 seconds
         </p>
      <p>
         Saxon-HE/C (C++): 132 seconds 
         </p>
      <p>
         Saxon-HE/C (PHP): 137 seconds
         </p>
      <p>
         libxslt (C/C++):        213 seconds
         </p>
      <ul>
         <li>Update on the times reported for Saxon-HE/C as a result of optimizations in the
            JET compiler.</li>
         <li>Code used to get libxslt time taken from: <a href="http://xmlsoft.org/XSLT/tutorial/libxslttutorial.html">http://xmlsoft.org/XSLT/tutorial/libxslttutorial.html</a></li>
      </ul>
      <p>The times for Saxon-HE/C are without the cost of JET initialisation and loading the
         library, which accounted for only 4 seconds. So we observe that there is not a big
         overhead between C++ and the PHP
         extension. The biggest cost as expected is between Java and C++, where we see on the
         C++/PHP platform a slowdown of ~ x2.2. We also observe that Saxon-HE/C out performs
         libxslt on C/C++ by ~40% on q8.
         </p>
      <p>
         See project page on <a href="http://www.saxonica.com/saxon-c/index.xml">Saxon/C</a>. 
         
         
         
         </p>
   </div></content></entry><entry><title>Experiences with XSLTForms and Servlex</title><link href="https://blog.saxonica.com/oneil/2013/03/experiences-with-client-side-xsltforms-and-server-side-servlex.html" rel="alternate" type="text/html"/><id>https://blog.saxonica.com/oneil/2013/03/experiences-with-client-side-xsltforms-and-server-side-servlex.html</id><published>2013-03-08T15:04:28Z</published><content type="xhtml" xml:base="https://blog.saxonica.com/oneil/2013/03/experiences-with-client-side-xsltforms-and-server-side-servlex.html"><div xmlns="http://www.w3.org/1999/xhtml">
      
      <p>At Saxonica, we have for a long time now used a tailor-made java application to create
         and issue licenses for all commercial products we develop. There is no real database
         at the back-end, but just a local XML file with customer details and copies of the
         licenses created and issued. For a one man company this poses no real problem, but
         inevitably as the company has expanded over the last two years this has been a major
         concern.</p>
      <p>Early last year Mike Kay presented me with the task to create a new saxon-license
         application with the following requirements:</p>
      <ul>
         <li>Accessible  to all employees, preferable web-based</li>
         <li>Core Java tool should remain intact </li>
         <li>Centralised store, must be XML-based</li>
         <li>Secure application</li>
      </ul>
      <p>From the outset we thought that for such a tool which heavily relies on XML and XSLT
         at its core, that requirements would be best met using XSLTForms and Servlex to develop
         the tool.</p>
      <p>In this blog post I would like to share my own experiences in the development of the
         saxon-license webapp using Servlex and XSLTForms. In the discussion I include how
         we stitched on our existing back-end core Java tool and challenges faced with encoding.
         Specific details of the features and functions are not that important here, only the
         engineering process is of interest.</p>
      <p>On the client-side we write XForms [1] documents, which are manipulated by XSLTForms
         [2] (created by Alain Couthures)  to render in the browsers. XSLTForms is an open
         source client-side implementation, not a plug-in or install, that works with all major
         browsers.</p>
      <p>On the server-side we integrate the core Saxon-license tool in a Servlex webapp [3]
         as Saxon extension functions called from within XSL. The Servlex is an open-source
         implementation of the EXPath webapp framework [4] based on Saxon and Calabash as its
         XSLT, XQuery and XProc processors.  Servlex provides a way to write web applications
         directly in XSLT. It is developed as a Java EE application requiring Servlet technology,
         sitting on tomcat for binding to HTTP server-side.</p>
      <h3>Saxon License-tool functionality</h3>
      <p>The server-side Servlex works as a dispatching and routing
         mechanism to components (implementation as XSLT stylesheets), applying
         a configuration-based mapping between the request URI and the
         component used to process that URI. The container communicates with
         the components by means of an XML representation of the HTTP request,
         and receives in turn XML data with HTML at the request body with
         XForms content and XSLTForms references to render the page.  The
         representation of the HTTP response is sent back to the client. There
         are buttons on the forms, which if pressed trigger the action HTTP PUT
         request; made through the client-side XSLTForms. These requests are
         handled by Servlex.</p>
      <p>There are 7=5 main XSLT functions described below, which map the
         URIs to generate the various XForms to tunnel the instance data
         between the XForms. These functions all make calls to the core
         Saxon-license tool written in Java, made available as a Saxon
         extensions calls from the XSLT:</p>
      <ol>
         <li>
            
            <p>fnRunMainForm: A request to serve the main form is made with the following URI pattern:
               </p>
            
            <pre>http://192.168.0.2:8080/app/license-tool/main</pre>
            
            
            <p>License
               requests are usually made through the main saxonica website either for
               evaluation or paid order (See:
               <a href="http://www.saxonica.com/download/download.xml">http://www.saxonica.com/download/download.xml</a> and
               <a href="http://www.saxonica.com/purchase/purchase.xml">http://www.saxonica.com/purchase/purchase.xml</a>, respectively), these
               orders are receives as an email, which are then copied and pasted on the
               main form. This data is sent in the form of a XForms
               instance in a web request, picked up by servlex.</p>
            </li>
         <li>
            
            <p>fnManualEntry: Manual Entry form for manual creation of the customer
               details to create a license. A request is made to servlex with the
               following URI
               pattern:http://192.168.0.2:8080/app/license-tool/manualEntry</p>
            </li>
         <li>
            
            <p>fnFetchRecord: Existing licenses created we can retrieve and re-issue. A
               request is made to Servlex with the following URI pattern. We observe
               the parameter after the ? Is the license number to fetch:</p>
            
            
            <pre>http://192.168.0.2:8080/app/license-tool/fetchRecord?Select=X002110</pre>
            </li>
         <li>
            
            <p>fnReport: This function generates an HTML page containing all license created or such
               the last 20.</p>
            
            <pre>http://192.168.0.2:8080/app/license-tool/report</pre>
            </li>
         <li>
            
            <p>fnEditParseRequest: Manual Entry form: The manual form
               with the client data populated. The order request from the main form is
               parsed and returned as a Xforms instance data which is used to generate
               the form on the server. A request is made to Servlex with the following
               URI pattern:</p>
            
            
            <pre>http://192.168.0.2:8080/app/license-tool/editParseRequest</pre>
            </li>
      </ol>
      <p>Securing access to the saxon-license webapp is achieved through
         apache2 configuration.</p>
      <h3>Encoding problem</h3>
      <p>A long-standing problem we faced in this application was the handling of non-ASCII
         characters. We raised this issue with Alain and Florent the creators of XSLTForms
         and Servlex, respectively, to get to the bottom of this problem.</p>
      <p>Basically, if the user enters data on a form, we're sending it back to the server
         in a url-encoded POST message, and it's emerging from Servlex in the form of XML presented
         as a string, and if there are non-ASCII characters then they are miscoded. In the
         form we set the submission method attribute to 'xml-urlencoded-post' to guarantee
         that the next page will fully replace the current one: XMLHttpRequest is not used
         in this case. </p>
      <p>We were seeing the typical pattern that you get when the input characters are encoded
         as a UTF-8 byte sequence and the byte sequence is then decoded to characters by someone
         who believes it to be 8859-1. We were not able to work out where the incorrect decoding
         was happening. We originally circumvented the problem by reversing the error: we converted
         the string back to bytes treating each char as a byte, and then decoded the bytes
         as UTF-8.</p>
      <p>A feature of XSLTForms is the profiler (enabled by pressing F1 or setting debug='yes'
         in the xsltforms-options process instruction). The profiler allows the inspection
         of the instance data. Another mechanism is to inspect the requests sent by the browser
         with the network profiler of a debugger.</p>
      <p>We established that on the client side, there is an HTML Form Element that gets built,
         and just before the submit() method gets called on this object, the data appears to
         be OK. But when we look at the Tomcat log of the POST request, it's wrong. Somewhere
         between the form.submit() on the client and the logging of the message on the server,
         it's getting corrupted. We can't actually see where the encoding and decoding is happening
         between these two points. </p>
      <p>To tackle this problem Florent provided a development version of Servlex, which added
         logging of the octets as they are read from the binary stream (the logger org.expath.servlex
         must be set to trace, which should be the default in that version).  In addition to
         logging the raw headers, as they are read by Tomcat.</p>
      <p>With this new version of Servlex in place I inputted the following data on the main
         form. We observe the euro symbol at the end of my first name 'O'Neil' is a non-ASCII
         character which needs to be preserved:</p>
      <pre>First Name: O'Neil€
Last Name: Delpratt
Company: Saxonica
Country: United Kingdom
Email Address: oneil@saxonica.com
Phone:
Agree to Terms: checked </pre>
      <p>After submitting this data to the URI pattern: .../app/license-tool/editParseRequest 
         we see below the the log data reported by tomcat. What is interesting is the line
         'DEBUG [2013-03-04 18:06:34,281]: Request - header   : content-type / application/x-www-form-urlencoded'. 
         Also at this stage the input to the receiving form has been corrupted to 'O'Neilâ‚¬'
         which should be 'O'Neil€' :</p>
      <pre>DEBUG [2013-03-04 18:06:34,279]: Request - servlet  : parseRequest
DEBUG [2013-03-04 18:06:34,280]: Request - path     : /parseRequest
DEBUG [2013-03-04 18:06:34,280]: Request - method   : POST
DEBUG [2013-03-04 18:06:34,280]: Request - uri      : http://localhost:8080/app/license-tool/parseRequest
DEBUG [2013-03-04 18:06:34,280]: Request - authority: http://localhost:8080
DEBUG [2013-03-04 18:06:34,280]: Request - ctxt_root: /app/license-tool
DEBUG [2013-03-04 18:06:34,280]: Request - param    : postdata / &lt;Document&gt;&lt;Data&gt;First Name: O'Neilâ‚¬
Last Name: Delpratt
Company: Saxonica
Country: United Kingdom
Email Address: oneil@saxonica.com
Phone:
Agree to Terms: checked &lt;/Data&gt;&lt;Options&gt;&lt;Confirmed&gt;false&lt;/Confirmed&gt;&lt;Create&gt;false&lt;/Create&gt;&lt;Send&gt;false&lt;/Send&gt;&lt;Generate&gt;false&lt;/Generate&gt;&lt;Existing/&gt;&lt;/Options&gt;&lt;/Document&gt;
DEBUG [2013-03-04 18:06:34,281]: Request - header   : host / localhost:8080
DEBUG [2013-03-04 18:06:34,281]: Request - header   : user-agent / Mozilla/5.0 (X11;
      Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0
DEBUG [2013-03-04 18:06:34,281]: Request - header   : accept / text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
DEBUG [2013-03-04 18:06:34,281]: Request - header   : accept-language / en-gb,en;q=0.5
DEBUG [2013-03-04 18:06:34,281]: Request - header   : accept-encoding / gzip, deflate
DEBUG [2013-03-04 18:06:34,281]: Request - header   : referer / http://localhost:8080/app/license-tool/main
DEBUG [2013-03-04 18:06:34,281]: Request - header   : connection / keep-alive
DEBUG [2013-03-04 18:06:34,281]: Request - header   : content-type / application/x-www-form-urlencoded
DEBUG [2013-03-04 18:06:34,281]: Request - header   : content-length / 482
DEBUG [2013-03-04 18:06:34,281]: Raw body content type: application/x-www-form-urlencoded
TRACE [2013-03-04 18:06:34,281]: TraceInputStream(org.apache.catalina.connector.CoyoteInputStream@771eeb)
TRACE [2013-03-04 18:06:34,282]: read([B@1a70476): -1</pre>
      <p>Florent made the following observations:</p>
      <ol>
         <li>The content-type is application/x-www-form-urlencoded, which should conform to http://www.w3.org/TR/xforms/#serialize-urlencode,
            but seems not to: the XML seems to be passed as is, instead of been split into individual
            elements and their string values. But I am not an expert on XForms so I might be wrong.¶</li>
         <li>Still about application/x-www-form-urlencoded and the same section, it says that the
            non-ASCII characters are replaced based on the octets of their UTF-8 representation,
            so the encoding should not be used here.  This content-type does not carry any charset
            parameter anyway, if I am right.</li>
         <li>Again about application/x-www-form-urlencoded, it is actually handled by Java EE as
            parameters, instead of simply giving the raw POST entity content.  I am not sure exactly
            how it works WRT the encoding.</li>
      </ol>
      <p>Alain provided the following example to test the assumptions made by Florent.</p>
      <p>Encoding.xhtml:</p>
      <pre>&lt;html xmlns="http://www.w3.org/1999/xhtml" xmlns:xf="http://www.w3.org/2002/xforms"&gt;
    &lt;head&gt;
        &lt;title&gt;Encoding Test&lt;/title&gt;
        &lt;xf:model&gt;
            &lt;xf:instance&gt;
                &lt;data/&gt;
            &lt;/xf:instance&gt;
            &lt;xf:submission id="s01" method="xml-urlencoded-post" replace="all" action="http://www.agencexml.com/xsltforms/dump.php"&gt;
                &lt;xf:message level="modeless" ev:event="xforms-submit-error"&gt;Submit error.&lt;/xf:message&gt;
            &lt;/xf:submission&gt;
        &lt;/xf:model&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;xf:input ref="."&gt;
            &lt;xf:label&gt;Input:&lt;/xf:label&gt;
        &lt;/xf:input&gt;
        &lt;xf:submit&gt;
            &lt;xf:label&gt;Save&lt;/xf:label&gt;
        &lt;/xf:submit&gt;
    &lt;/body&gt;
&lt;/html&gt;</pre>
      <p>dump.php:</p>
      <pre>&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
    &lt;head&gt;
        &lt;title&gt;HTTP XML POST Dump&lt;/title&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;h1&gt;HTTP XML POST Dump&lt;/h1&gt;
        &lt;h2&gt;Raw Data :&lt;/h2&gt;
        &lt;?php
        $body = file_get_contents("php://input");
        echo strlen($body);
        echo " bytes: &lt;br/&gt;";
        echo "&lt;pre&gt;$body&lt;/pre&gt;";
        if(substr($body,0,9) == "postdata=") {
            $body = urldecode(substr($body,strpos($body,"=")+1));
        }
        $xml = new DOMDocument();
        $xml-&gt;loadXML($body);
        $xslt = new XSLTProcessor();
        $xsl = new DOMDocument();
        $indent = "&lt;xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\"&gt;&lt;xsl:output method=\"xml\" indent=\"yes\" encoding=\"UTF-8\"/&gt;&lt;xsl:template match=\"@*|node()\"&gt;&lt;xsl:copy-of select=\".\"/&gt;&lt;/xsl:template&gt;&lt;/xsl:stylesheet&gt;";
        $xsl-&gt;loadXML($indent);
        $xslt-&gt;importStylesheet($xsl);
        $result = $xslt-&gt;transformToXml($xml);
        $result = substr($result, strpos($result,"?&gt;")+3);
        echo "&lt;h2&gt;Indented XML :&lt;/h2&gt;&lt;pre&gt;".htmlspecialchars($result, ENT_QUOTES)."&lt;/pre&gt;";
        ?&gt;
    &lt;/body&gt;
&lt;/html&gt;</pre>
      <p>When submitting '€', I get this:</p>
      <pre>HTTP XML POST Dump
Raw Data :
41 bytes:
postdata=%3Cdata%3E%E2%82%AC%3C%2Fdata%3E
Indented XML :
&lt;data&gt;€&lt;/data&gt;</pre>
      <p>and with Firebug, I can see following, which is correct:</p>
      <p>
         <img alt="saxon-license img3" src="http://dev.saxonica.com/img/license-tool-dump.png"/></p>
      <p>Florent states:</p>
      <p>What should be in the content of the HTTP request is %E2%82%AC to represent the Euro
         symbol as URL- encoded (because that represents the 3 octets of  in UTF-8).<br/>Because of the "automatic" handling of that Content-Type by Java EE, I am afraid the
         only way to know for sure what is on the wire is to actually look into it (using a
         packet sniffer, like Wireshark for instance).</p>
      <p>At this stage it was important to check what packets are being sent. The following
         is a snippet of the reports from Wireshark, with the data format correct at this point.</p>
      <pre>HTTP    1207    POST /app/license-tool/parseRequest HTTP/1.1  (application/x-www-form-urlencoded)
[truncated] postdata=%3CDocument+xmlns%3D%22%22%3E%3CData%3EFirst+Name%3A+O%27Neil
%E2%82%AC%0D%0A%0D%0ALast+Name%3A+Delpratt%0D%0A%0D%0ACompany%3A+Saxonica
%0D%0A%0D%0ACountry%3A+United+Kingdom%0D%0A%0D%0AEmail+Address%3A+oneil
%40saxonica.c
...</pre>
      <p>Florent discovered using Alain's test case that it was actually Tomcat itself interpreting
         the %xx encoding as Latin-1!  More infos at:</p>
      <p><a href="http://wiki.apache.org/tomcat/FAQ/CharacterEncoding">http://wiki.apache.org/tomcat/FAQ/CharacterEncoding</a></p>
      <p>In summary, the message is the decoding was done using 8859-1 not UTF-8 as one would
         expect.</p>
      <p>To overcome the problem Florent created a new config property for Servlex, which is
         named org.expath.servlex.default.charset, the value of which can be set to "UTF-8"
         in Tomcat's conf/catalina.properties. If set, it's value is used as the charset for
         requests without an explicit charset in Content-Type.</p>
      <p>Thanks to Florent, Alain and Mike the encoding problem has now been resolved. The
         lesson learnt in all, is that tracking down encoding problems can still be very hard
         work.</p>
      <p>References<br/>[1] XForms. W3C. <a href="http://www.w3.org/MarkUp/Forms/">http://www.w3.org/MarkUp/Forms/</a><br/>[2] XSLTForms. Alain Couthures. <a href="http://www.agencexml.com/xsltforms">http://www.agencexml.com/xsltforms</a><br/>[3] Servlex. Florent George. Gihub: <a href="https://github.com/fgeorges/servlex">https://github.com/fgeorges/servlex</a>  Google Project: <a href="http://code.google.com/p/servlex/">http://code.google.com/p/servlex/</a><br/>[4] EXPath Webapp.<b> </b><a href="http://expath.org/wiki/Webapp">http://expath.org/wiki/Webapp</a></p>
   </div></content></entry><entry><title>Saxon performance measures of the Word Ladders problem in XSLT</title><link href="https://blog.saxonica.com/oneil/2012/12/performance-measures-of-the-word-ladders-problem-in-xslt.html" rel="alternate" type="text/html"/><id>https://blog.saxonica.com/oneil/2012/12/performance-measures-of-the-word-ladders-problem-in-xslt.html</id><published>2012-12-06T16:23:33Z</published><content type="xhtml" xml:base="https://blog.saxonica.com/oneil/2012/12/performance-measures-of-the-word-ladders-problem-in-xslt.html"><div xmlns="http://www.w3.org/1999/xhtml">
      
      <p>I would like to report on some Saxon performance measure on a Word ladder solution
         implemented in XSLT.</p>
      <p>Firstly, some background information on the Word ladder problem. From Wikipedia, the
         free encyclopedia:</p>
      <p>A <b>word ladder</b> (also known as a <b>doublets</b>, <b>word-links</b>, or <b>Word golf</b>) is a <a href="http://en.wikipedia.org/wiki/Word_game">word game</a> invented by <a title="Lewis Carroll" href="http://en.wikipedia.org/wiki/Lewis_Carroll">Lewis Carroll</a>.
         A word ladder puzzle begins with two given words, and to solve the 
         puzzle one must find the shortest chain of other words to link the two 
         given words, in which chain every two adjacent words (that is, words in 
         successive steps) differ by exactly by one letter.</p>
      <p>XSLT interest in this problem was first started (to the best of my knowledge) by Dimitre
         Novatchev through the <a href="http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201211/msg00187.html">mulberry mailing list</a>, who provides a 20 step guide to create a stylesheet in his blog to solve the Word
         ladder problem (<a href="http://dev.saxonica.com/oneil/FindChainOfWordsHamming.xsl">FindChainOfWordsHamming.xsl</a>). Following the post on the list, there has been some interest; another solution
         to this problem was given by Wolfgang Laun (please see <a href="http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201211/msg00210.html">thread</a>, file: <a href="http://dev.saxonica.com/oneil/FindChainOfWordsHamming2.xsl">FindChainOfWordsHamming2.xsl</a>).</p>
      <p><b>Experimental Evaluation</b></p>
      <p>Our interest resides in the Saxon performances only. I was curious and surprised by
         the results reported by Dimitre. The question I had is why Dimitre's stylesheet was
         much slower than Wolfgang's stylesheet in Saxon and faster in another XSLT processor:
         there must be some optimization step we were not making. I was motivated to understand
         were the bottle necks were and how we could improve the performance in Saxon.</p>
      <p>Wolfgang wrote: "The XSLT program is three times faster on one XSLT implementation
         than on another one is strange, 'very' strange". </p>
      <p>Mike Kay addressed Wolfgang's comment by writing in the thread: "No, it's extremely
         common. In fact, very much larger factors than this 
         are possible. Sometimes Saxon-EE runs 1000 times faster than Saxon-HE. 
         This effect is normal with declarative languages where powerful 
         optimizations are deployed - SQL users will be very familiar with the 
         effect."</p>
      <p>The table below shows the execution times of the stylesheets in Saxon 9.XX (for some
         recent X). Time were reported by Dimitre.</p>
      <table border="1">
         
         
         <tbody>
            
            
            <tr>
               
               
               <th>Transformation</th>
               
               
               <th>Times (secs)</th>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>Dimitre</td>
               
               
               <td>39</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>Wolfgang</td>
               
               
               <td>25</td>
               </tr>
            </tbody>
         </table>
      <p>We observe that Wolfgang's transformation is 1.56 times faster. Please note that with
         Wolfgang's stylesheet his results lists all solutions (i.e. ladders), whereas Dimitre
         only finds one.</p>
      <p>Saxon represents a stylesheet as a compiled abstract syntax tree (AST) which is processed
         in a interpreted manner. Since the release of Saxon 9.4 we have included the bytecode
         generation feature, which allows us at the compilation phase to generate directly
         the byte code representation of the entire AST or sub-trees of it where performance
         benefits can be achieved. We make use of properties we know at compile time (See <a href="http://www.balisage.net/Proceedings/vol7/html/Delpratt01/BalisageVol7-Delpratt01.html">full paper</a>).</p>
      <p><b>Analysis of Dimitre's Stylesheet</b></p>
      <p>Step one was to see how well Saxon does with the bytecode feature switched on. This
         proved inconclusive because we discovered a bug in the bytecode generated. A useful
         exercise already, we managed to fix the bug (see bug issue: <a href="https://saxonica.plan.io/issues/1653">#1653</a>). The problem was in the function processQueue the tail recursive call was not being
         properly generated into bytecode. </p>
      <p>The Table below shows running times of the stylesheets under Saxon 9.4.0.6. We observe
         that Wolfgang's stylesheet was 2.07 and 3.22 faster in Saxon Intepreted and bytecode,
         respectively.</p>
      <table border="1">
         
         
         
         <tbody>
            
            
            <tr>
               
               
               <th>Transformation</th>
               
               
               <th>Interpreted - Times (secs)</th>
               
               
               <th>With bytecode generation - Times (secs)</th>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>Dimitre</td>
               
               
               <td>7.95</td>
               
               
               <td>7.78</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>Wolfgang</td>
               
               
               <td>3.83</td>
               
               
               <td>2.41</td>
               </tr>
            </tbody>
         </table>
      <p>Analyzing Dimitre's stylesheet with the Saxon tracing profile (i.e. option -TP) proved
         useful. See the html output produced by Saxon below. We observe that there is a big
         hit on the processNode method, with the most time spent in this function.</p>
      <h3>Analysis of Stylesheet Execution Time</h3>
      <p>Total time: 9498.871 milliseconds</p>
      <p>
         <b>Time spent in each template or function:</b>
         
         </p>
      <p>The table below is ordered by the total net time spent in the template or   function.
         Gross time means the time including called templates and functions;  net time means
         time excluding time spent in called templates and functions.
         </p>
      <table border="1">
         
         
         
         <thead>
            
            
            
            <tr>
               
               
               
               <th>file</th>
               
               
               
               <th>line</th>
               
               
               
               <th>instruction</th>
               
               
               
               <th>count</th>
               
               
               
               <th>avg time (gross)</th>
               
               
               
               <th>total time (gross)</th>
               
               
               
               <th>avg time (net)</th>
               
               
               
               <th>total time (net)</th>
               </tr>
            </thead>
         
         
         
         <tbody>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>79</td>
               
               
               
               <td>function my:processNode</td>
               
               
               
               <td align="right">2053</td>
               
               
               
               <td align="right">4.12</td>
               
               
               
               <td align="right">8470.67</td>
               
               
               
               <td align="right">3.729</td>
               
               
               
               <td align="right">7655.792</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>21</td>
               
               
               
               <td>function my:chainOfWords</td>
               
               
               
               <td align="right">1</td>
               
               
               
               <td align="right">9491.1</td>
               
               
               
               <td align="right">9491.12</td>
               
               
               
               <td align="right">993.34</td>
               
               
               
               <td align="right">993.34</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>131</td>
               
               
               
               <td>function f:eq</td>
               
               
               
               <td align="right">3993</td>
               
               
               
               <td align="right">0.06</td>
               
               
               
               <td align="right">230.02</td>
               
               
               
               <td align="right">0.058</td>
               
               
               
               <td align="right">230.26</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>131</td>
               
               
               
               <td>function my:HammingDistance</td>
               
               
               
               <td align="right">3993</td>
               
               
               
               <td align="right">0.20</td>
               
               
               
               <td align="right">807.38</td>
               
               
               
               <td align="right">0.049</td>
               
               
               
               <td align="right">194.77</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*func-apply.xsl"</td>
               
               
               
               <td>21</td>
               
               
               
               <td>function f:apply</td>
               
               
               
               <td align="right">15972</td>
               
               
               
               <td align="right">0.01</td>
               
               
               
               <td align="right">290.01</td>
               
               
               
               <td align="right">0.011</td>
               
               
               
               <td align="right">175.00</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*-Operators.xsl"</td>
               
               
               
               <td>244</td>
               
               
               
               <td>template f:eq</td>
               
               
               
               <td align="right">15972</td>
               
               
               
               <td align="right">0.01</td>
               
               
               
               <td align="right">115.01</td>
               
               
               
               <td align="right">0.004</td>
               
               
               
               <td align="right">68.23</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*-Operators.xsl"</td>
               
               
               
               <td>248</td>
               
               
               
               <td>function f:eq</td>
               
               
               
               <td align="right">15972</td>
               
               
               
               <td align="right">0.003</td>
               
               
               
               <td align="right">46.77</td>
               
               
               
               <td align="right">0.003</td>
               
               
               
               <td align="right">46.77</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*nc-zipWith.xsl"</td>
               
               
               
               <td>21</td>
               
               
               
               <td>function f:zipWith</td>
               
               
               
               <td align="right">19965</td>
               
               
               
               <td align="right">0.002</td>
               
               
               
               <td align="right">33.11</td>
               
               
               
               <td align="right">0.002</td>
               
               
               
               <td align="right">33.11</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*nc-zipWith.xsl"</td>
               
               
               
               <td>9</td>
               
               
               
               <td>function f:zipWith</td>
               
               
               
               <td align="right">19965</td>
               
               
               
               <td align="right">0.003</td>
               
               
               
               <td align="right">57.67</td>
               
               
               
               <td align="right">0.001</td>
               
               
               
               <td align="right">24.56</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*func-apply.xsl"</td>
               
               
               
               <td>16</td>
               
               
               
               <td>function f:apply</td>
               
               
               
               <td align="right">15972</td>
               
               
               
               <td align="right">0.019</td>
               
               
               
               <td align="right">309.52</td>
               
               
               
               <td align="right">0.001</td>
               
               
               
               <td align="right">19.52</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>70</td>
               
               
               
               <td>function my:processQueue</td>
               
               
               
               <td align="right">2053</td>
               
               
               
               <td align="right">0.009</td>
               
               
               
               <td align="right">18.35</td>
               
               
               
               <td align="right">0.009</td>
               
               
               
               <td align="right">18.35</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*hFunctions.xsl"</td>
               
               
               
               <td>498</td>
               
               
               
               <td>function f:string-to-codepoints</td>
               
               
               
               <td align="right">3993</td>
               
               
               
               <td align="right">0.003</td>
               
               
               
               <td align="right">10.52</td>
               
               
               
               <td align="right">0.003</td>
               
               
               
               <td align="right">10.52</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>120</td>
               
               
               
               <td>function my:HammingDistance</td>
               
               
               
               <td align="right">3993</td>
               
               
               
               <td align="right">0.204</td>
               
               
               
               <td align="right">814.48</td>
               
               
               
               <td align="right">0.002</td>
               
               
               
               <td align="right">7.09</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*hFunctions.xsl"</td>
               
               
               
               <td>498</td>
               
               
               
               <td>function f:string-to-codepoints</td>
               
               
               
               <td align="right">3993</td>
               
               
               
               <td align="right">0.001</td>
               
               
               
               <td align="right">4.88</td>
               
               
               
               <td align="right">0.001</td>
               
               
               
               <td align="right">4.88</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>73</td>
               
               
               
               <td>function my:processNode</td>
               
               
               
               <td align="right">2053</td>
               
               
               
               <td align="right">4.128</td>
               
               
               
               <td align="right">8475.2</td>
               
               
               
               <td align="right">0.002</td>
               
               
               
               <td align="right">4.57</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>54</td>
               
               
               
               <td>function my:processQueue</td>
               
               
               
               <td align="right">2053</td>
               
               
               
               <td align="right">0.011</td>
               
               
               
               <td align="right">22.20</td>
               
               
               
               <td align="right">0.002</td>
               
               
               
               <td align="right">3.85</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>17</td>
               
               
               
               <td>template /*</td>
               
               
               
               <td align="right">1</td>
               
               
               
               <td align="right">9491.87</td>
               
               
               
               <td align="right">9491.9</td>
               
               
               
               <td align="right">0.756</td>
               
               
               
               <td align="right">0.76</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>40</td>
               
               
               
               <td>function my:chainOfWords</td>
               
               
               
               <td align="right">1</td>
               
               
               
               <td align="right">0.344</td>
               
               
               
               <td align="right">0.34</td>
               
               
               
               <td align="right">0.344</td>
               
               
               
               <td align="right">0.34</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>117</td>
               
               
               
               <td>function my:enumerate</td>
               
               
               
               <td align="right">10</td>
               
               
               
               <td align="right">0.166</td>
               
               
               
               <td align="right">1.65</td>
               
               
               
               <td align="right">0.029</td>
               
               
               
               <td align="right">0.29</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>"*rdsHamming.xsl"</td>
               
               
               
               <td>111</td>
               
               
               
               <td>function my:enumerate</td>
               
               
               
               <td align="right">10</td>
               
               
               
               <td align="right">0.176</td>
               
               
               
               <td align="right">1.76</td>
               
               
               
               <td align="right">0.010</td>
               
               
               
               <td align="right">0.10</td>
               </tr>
            </tbody>
         </table>
      <p>In addition to the Saxon tracing profile I ran the Java hrof profiling tool, which
         showed up that most time was spent in comparing strings. See the Java profile results
         below. It was now obvious that the GeneralComparison expression was in question. Specifically
         we narrowed it down to the instruction: <code>&lt;xsl:for-each select="$vNeighbors[not(. = $pExcluded)]"&gt;</code>. For the interpreted code we were doing some unnecessary runtime type checking when
         we know statically at compile time that we are comparing string values. More Specifically,
         we know at compile time that $vNeighbors is a sequence of untyped atomic values and
         $pExcluded is a sequence of strings. We were unnecessarily checking at runtime that
         untyped atomic and string literal were comparable and we were doing an unnecessary
         conversion from an untyped atomic to string.  </p>
      <pre>CPU SAMPLES BEGIN (total = 1213) Thu Nov 29 14:42:47 2012
rank   self  accum   count trace method
   1 24.24% 24.24%     294 300547 java.lang.Integer.hashCode
   2 19.13% 43.36%     232 300581 net.sf.saxon.expr.GeneralComparison.compare
   3  7.75% 51.11%      94 300613 java.util.HashMap.getEntry
   4  2.14% 53.26%      26 300570 java.util.LinkedHashMap$Entry.recordAccess
   5  2.06% 55.32%      25 300234 java.lang.ClassLoader.defineClass1
   6  2.06% 57.38%      25 300616 com.saxonica.expr.ee.GeneralComparisonEE.effectiveBooleanValue
   7  1.98% 59.36%      24 300603 java.util.LinkedHashMap$Entry.recordAccess
   8  1.98% 61.34%      24 300609 net.sf.saxon.type.Converter.convert
....</pre>
      <p>See full hprof results: <a href="http://dev.saxonica.com/oneil/java.hprof-DN.txt">java.hprof-DN.txt</a></p>
      <p><b>Improvements in Bytecode generation</b></p>
      <p>In the bytecode we discovered we were missing out on opportunities to capitalise on
         static properties we know at compile time. For example during atomization we were
         doing an instanceof test to see whether each item was a node when we already know
         from static analysis that this was the case. We were also able to avoid unnecessary
         conversions of the strings, checking of instanceof and we found we could avoid repeated
         conversions by saving of string values for reuse when appropriate.</p>
      <p>With the code improvements discussed above we were able to apply them in Saxon-EE
         9.5 (pre-release). The table below shows these running times on the stylesheet written
         by Dimitre and Wolfgang. We observe that in the interpreted code that Wolfgang's XSL
         is 2.13 times faster than Dimitre (This is similar to Dimitre results above). With
         the bytecode generation feature switched on: Dimitre's stylesheet has dramatically
         improved in performance and is now 1.19 times faster than Wolfgang's XSL.</p>
      <table border="1">
         
         
         
         <tbody>
            
            
            <tr>
               
               
               <th>Transformation</th>
               
               
               <th>Interpreted - Times (secs)</th>
               
               
               <th>With bytecode generation - Times (secs)</th>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>Dimitre</td>
               
               
               <td>7.373</td>
               
               
               <td>1.938</td>
               </tr>
            
            
            
            <tr>
               
               
               
               <td>Wolfgang</td>
               
               
               <td>3.450</td>
               
               
               <td>2.17</td>
               </tr>
            </tbody>
         </table>
      <p>We have not done any similar analysis on Wolfgang's stylesheet, we will now attempt
         to do this.</p>
      <p>To be continued....</p>
   </div></content></entry></feed>