Sablotron: Future Directions

Introduction

This document describes possible directions of the future development of Sablotron XML processor. Different users have different needs and they solve different problems. What we want is to outline improvements and new features that can be taken under considerations, and to start a discussion about possible issues. We also would like to get as much feedback from users as possible.

Arising issues split into several groups naturally:

XPath 2.0 and XSLT 2.0

Now, what are we speaking of is Sablotron 2.0; a new processor rewritten from the scratch. There is no chance to extend the existing engine to cover XPath 2.0 and XSLT 2.0.

This is the biggest issue at all, of course. Generally, the XSLT 2.0 WD brings a lot of promising ideas, such as the elegant sequence model, richer XPath syntax (with conditionals and loops). On the other hand, the 2.0 versions bring many problematic changes as well.

The built-in support for W3C XML Schema (or dependency on, if you wish) is the most critical new requirement from our point of view. We would need either to develop an XML Schema validator of our own or to make use of some third-party XSD validator, ideally implemented over the Expat XML parser. The first way would be a hard one. The second way could be easier, but we are not aware of a convenient validator.

The overall architecture would likely satisfy the new requirements quite fine, including things like multiple outputs. What must be redesigned from the scratch is the XPath evaluator.

What should be kept definitely is the possibility to evaluate XPath expression and run XSLT transformations on data provided externally, as provided by the SXP interface currently.

Architecture

This area covers many more or less technical issues, that are described in the standalone document FUTURE-TECH.html. Most of this document is related to extension functions; should it mean a support for a general, dynamic plug-in mechanism to embed external functions, or just a static implementation of particular EXSLT extensions. Look at this document if you want to learn more.

Sablotron Embedding

Just a few users employ Sablotron as standalone dynamic library directly. Most of them use it from popular scripting languages such as PHP, Perl, Python, Ruby, etc. The current API design seems to be sufficient for all of these wrappers and extensions. Another question is whether the wrappers are designed in a proper way and benefit from all Sablotron features. As all the wrappers (with the only exception of Perl XML::Sablotron module) are independent projects, Sablotron authors can't ensure their correctness and completeness directly. Despite of this fact, we are interested in the wrappers and extensions and we do our best to communicate with their authors actively.

PHP xslt extension

Most users employ Sablotron through PHP obviously. Unfortunately, PHP xslt extension is a bit old-fashioned and isn't developed too quickly. It's a kind of bitter irony: Most people see Sablotron in the mirror of PHP extension which is out of our control and hides advanced features of Sablotron (pre-parsed trees, DOM, SXP). This appears to be a pity for both PHP users and Sablotron.

As PHP happened to be the most frequent scripting language using Sablotron, we are getting tired to repeat "Sorry, this not our fault." and we are considering to be more active in this field. We are open to any collaboration with PHP xslt extension maintainers and users, and we still believe it makes sense to improve the existing module and introduce more Sablotron features to PHP users.

This is what we suggest to do:

Processor Interface

Currently, XSLT interface functions are what most users are interested in. The PHP extension uses an old fashioned API which doesn't allow to exploit advanced features like cashing prepared trees. A better, object oriented interface seems to be essential.

DOM

Sablotron provides DOM-like C API to its parsed trees allowing wrappers to implement DOM Level2 read/write interface to the trees. The xslt extension could use this interface and provide PHP users with another DOM implementation. Users thus would be allowed to built DOM trees and process them with XPath/XSLT. There are indications (in PHP's cvs) that the DOM implementation has started, but the progress is very slow, if any.

SXP

The SXP interface allows to run XSLT and XPath processor with data, that aren't stored in Sablotron internal structures, but are provided by a set of callback functions. This makes it possible to process documents in databases, calculated documents, etc. without the need to serialize/parse them before processing.

C++ wrapper

Some users would perhaps welcome a C++ wrapper over the API. It's missing now.

Design and Optimization

So far, much more effort has been focused on missing features than on performance and design optimization. There are two major issues:

Memory management

The memory management is optimized for a typical XSLT processing, when trees are parsed, processed and dropped. As DOM was introduced, this mechanism became a bit insufficient, but still not critical. Some problems might occur, if huge trees are processed with the DOM interface with massive drops of data.

Speed

The XPath processor hides several secrets, that might be revealed and optimized.

The template matching could be optimized too, which would result in better performance of templates with many apply-templates instructions and many templates defined.

Reference counting of some internal classes instead of making copies should help too.

Portability and QA

We still meet some problems with portability. The basic rules we follow are right, but minor problems appear all the time on particular platforms. This is more a QA issue then anything else.

We would need is a couple of volunteers helping us with QA on the most common platforms. A similar attempt has already been made but it was hard to keep people working and synchronized on the volunteer basis. Perhaps a better test environment (build manager) could help.

The problems with linking are an evergreen, when linking Sablotron pure C libraries using non-GNU linker. This is an Automake/Libtool problem actually, but we should collect solutions for as many platforms and compilers as we can. An external help is also necessary here.