How Safe is your Schema?

By Michael Kay on July 22, 2023 at 12:00p.m.

When you validate a document, you expect to set the rules for what it can contain. If you specify that your nesting box can only contain wrens and robins, you don't want any cuckoos in there.

So you write a schema nesting-box.xsd like this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="qualified"
    attributeFormDefault="unqualified"
    version="1.1">
    
    <xs:element name="nesting-box" type="nesting-box-type"/>
    
    <xs:complexType name="nesting-box-type">
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element ref="wren"/>
            <xs:element ref="robin"/>
        </xs:choice>
    </xs:complexType>
    
    <xs:element name="wren" type="xs:string"/>
    <xs:element name="robin" type="xs:string"/>
</xs:schema>         
        

And you're now comfortable that any <nesting-box> that passes validation will look something like this:

<nesting-box>
   <wren>Nice!</wren>
   <robin>Nicer!</robin>
   <robin>Nicest!</robin>
</nesting-box>
        

You're wrong!

The following document also passes validation:

<nesting-box xsi:schemaLocation="cuckoo.ns cuckoo.xsd" 
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <wren>Nice!</wren>
   <robin>Nicer!</robin>
   <robin>Nicest!</robin>
   <cuckoo xmlns="cuckoo.ns">Horrid!</cuckoo>
</nesting-box>
        

Here cuckoo.xsd is another schema document, like this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="qualified"
    attributeFormDefault="unqualified"
    targetNamespace="cuckoo.ns"
    version="1.1">
    
    <xs:import schemaLocation="nesting-box.xsd"/>
    
    <xs:element name="cuckoo" type="xs:string"
       substitutionGroup="robin"/>
    
</xs:schema>
        

So you thought you were constraining what could appear in the document, and the user found a way past your defenses, submitting a document that your code probably can't handle.

Saxon does allow you to disable use of xsi:schemaLocation, but it's enabled by default. I'm inclined to think the default needs to be changed.