Serving and styling XML on the web

For more than twenty years, it’s been possible to serve XML documents on the web, making them available to applications as well-formed XML and to humans as nicely formatted web pages. This is a win-win scenario for many users, but the WHATWG is moving aggressively to take away this long-standing web capability.

The WHATWG is moving to deprecate, and then remove, native support for XSLT in the browser. It’s XSLT that provides the formatted web page from an XML document. Faced with the prospect of losing native support for XSLT in the browser, a natural question our users ask is “can we use SaxonJS instead?” The answer is yes, but it’s a little bit complicated.

Let’s step back for a moment and look (broadly) at how the web works and what’s actually going on under the hood.

For the application, this is simple: it requests the web page; it gets back a document with an application/xml media type; it parses it; and it does what it does with the XML data. Job done.

For the human user reading the page with a web browser, there’s a little bit more going on.

The browser requests the page.
It gets back an application/xml page and parses it.
If the page has an xml-stylesheet processing instruction:
1. The browser loads the stylesheet that the processing instruction points to;
2. if it’s an XSLT stylesheet, it transforms the XML document with the stylesheet;
3. and it displays the transformed result to the reader

(If the document doesn’t have an xml-stylesheet processing instruction, the browser just displays the raw XML in some way or another. How useful this is depends on the browser and the XML markup.)

It’s the fact that the browser recognizes, and responds to, the xml-stylesheet processing instruction that starts the process off. This document:

<?xml-stylesheet href="/xslt/atom.xsl" type="text/xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:dcterms="http://purl.org/dc/terms/"
      xml:lang="EN-us">
   <title>Saxonica Weblogs</title>
   <link href="https://blog.saxonica.com/" rel="alternate" type="text/html"/>
   <link href="https://blog.saxonica.com/atom.xml" rel="self"/>
   <id>https://blog.saxonica.com/atom.xml</id>
   …

Will be styled with the /xslt/atom.xsl stylesheet before being displayed because the xml-stylesheet processing intruction on the first line tells it to.

If WHATWG takes away that capability, the XML document becomes inert.

The challenge that we, as users, face is to find some place to stand from which we can begin the styling process. The more common SaxonJS use case begins with an HTML browser page, and it’s obvious how to bootstrap a JavaScript application from there.

Luckily, for the moment at least, the browser will run scripts that are embedded in the XML using the script element in the HTML namespace. So that’s where we can stand. If we remove the processing instruction and add script lines, we can run SaxonJS:

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:dcterms="http://purl.org/dc/terms/"
      xml:lang="EN-us">

   <script xmlns="http://www.w3.org/1999/xhtml"
           type="text/javascript" src="/js/saxon-js/SaxonJS2.rt.js"></script>
   <script xmlns="http://www.w3.org/1999/xhtml">
      SaxonJS.transform({
        stylesheetLocation: "/xslt/atom.sef.json",
        initialTemplate: "main"
      }, "async");
   </script>

   <title>Saxonica Weblogs</title>
   <link href="https://blog.saxonica.com/" rel="alternate" type="text/html"/>
   <link href="https://blog.saxonica.com/atom.xml" rel="self"/>
   <id>https://blog.saxonica.com/atom.xml</id>
   …

I’m not going to try to give a SaxonJS tutorial here, but in brief: the first script loads SaxonJS; the second script loads the compiled version of the stylesheet, /xslt/atom.sef.json, and runs the template named “main”.

The /xslt/atom.xsl stylesheet can remain mostly the same as the one you’re using currently in the browser except, of course, that you can now use XSLT 3.0 features. One complication is that you have to add two templates to your XSLT stylesheet. The easiest way to do that is by placing these lines in a file called saxonjs-templates.xsl:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ixsl="http://saxonica.com/ns/interactiveXSLT"
                version="3.0">

<xsl:template name="main">
  <xsl:apply-templates select="ixsl:page()" mode="format-feed"/>
</xsl:template>

<xsl:template match="/" mode="format-feed">
  <xsl:result-document href="?." method="ixsl:replace-content">
    <xsl:apply-templates select="."/>
  </xsl:result-document>
</xsl:template>

</xsl:stylesheet>

The first template is the one that is run from the script in the XML document. It doesn’t have to be named “main” but the names used in both places have to be the same. This template takes the current page, the XML document, and processes it in the format-feed mode.

The second template just uses xsl:apply-templates on itself to process the XML document in the default mode, this will run the rest of your stylesheet, /xslt/atom.xsl. Surrounding that call to apply-templates is an xsl:result-document instruction that will have the effect of replacing the current document with the styled result. That’s what “?.” and ixsl:replace-content achieve.

Use xsl:include at the top of /xslt/atom.xsl to include the contents of saxonjs-templates.xsl:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:atom="http://www.w3.org/2005/Atom"
                xmlns="http://www.w3.org/1999/xhtml" 
                exclude-result-prefixes="#all"
                version="3.0">
<xsl:include href="saxonjs-templates.xsl"/>
<xsl:output method="html" html-version="5"
            encoding="utf-8" indent="no"/>
  ...

I’ve taken the opportunity to update the version and the xsl:output method as well, since we’re now using XSLT 3.0 not 1.0.

To compile /xslt/atom.xsl to /xslt/atom.sef.json, you can use Saxon-EE or you can install SaxonJS on Node.js and use node, something like this:

node xslt3.js -xsl:xslt/atom.xsl -export:xslt/atom.sef.json -nogo -ns:##html5

(Depending on your platform, you might have to quote or escape the hashes in -ns:##html5)

That all works. This is an unexplored use case; it’s clear that we could make it a little simpler by adding some options to the SaxonJS.transform method.

Not everyone will find this situation wholly satisfying. It’s a little bit complicated and it introduces an edit-compile-debug cycle into your workflow. You can’t just use the plain old XSL stylesheet; you have to use the compiled version. The compiler comes with Saxon-EE and is also available for free on Node.js, but they aren’t open source.

Other approaches that could be used include content negotiation or simply using redirects. Content negotiation is also a little bit complicated and requires the ability to configure the web server. Redirects are simpler, but introduce two different URIs which might be confusing.

With luck, the WHATWG will be persuaded not to break the web and we won’t have to do any of these things. But there are options.