FXPath - Functional XPath
Version:  0.3   Date:  2001-03-05   Author:  David Rosenborg

Abstract

This document specifies a language, FXPath, for creating user-defined extension functions in XSLT stylesheets.

This document started off as a comment to another specification, [User-Defined Extension Functions in XSLT] (EXSL for short) written by Jeni Tennison.

The EXSL document is in large an excellent compilation and presentation of the ideas and issues recently discussed in a thread on the XSL list (xsl-list@lists.mulberrytech.com). However, the EXSL document presents one of two rather different approaches on how to implement the extension functions. This document tries to present the other.

The EXSL approach is to retrofit some XSLT instructions so that they can deal with all types in XPath. Notably node-sets. This document wants to show that there is a more natural way to accomplish the same result: write extension functions in XPath to deal with XPath types. Since XPath 1.0 lacks some vital constructs to do this, this document presents a superset, called Functional XPath (FXPath), that makes this possible in a convenient way.

This document has a much narrower scope than the EXSL specification. It is concentrated around how to actually define the extension functions. The issues on calling functions, defining sets of common extension functions etc are well covered in EXSL and are not handled here. However, the set of example functions are reimplemented here, in FXPath, to enable a side-by-side comparison.

Contents

1 Introduction
2 Variables
3 Conditional Expression
4 Reduction Expression
5 Function Call Steps
6 Function Definition
7 Supersetting XPath 1.0
7.1 Lexical Structure
8 Defining FXPath Functions in XSLT
8.1 Namespace
8.2 Function Definition by fx:define
8.3 Function Definition by fx:template-function
9 Examples
9.1 Common Extension Functions
9.2 Message Functions
9.3 Set Functions
9.4 Numerical Functions
9.5 Generative Functions
9.6 Sorting Functions
9.7 Other Document Functions
10 Changes
11 Acknowledgements

1 Introduction

The purpose of this specification is to define a language, FXPath - Functional XPath, suitable for implementing extension functions in [XSLT 1.0]. The approach is to define a small set of augmentations to [XPath 1.0] and a way of specifying FXPath functions inside an XSLT stylesheet.

Even though most of the constructs defined in this document have counterparts in [XQuery] or [XPath 2.0], FXPath is not fully replaceable by either one of them. And FXPath doesn't try to replace them.

FXPath isn't expressed in terms of XQuery mainly for two reasons:

FXPath isn't expressed in terms of XPath 2.0 mainly because XPath 2.0 is too large for the scope of FXPath, and because it is still just a set of requirements, that is, no concrete syntax exists to build on.

FXPath is described in terms of XPath 1.0. The goal is to identify a small set of extensions that will enable users to write extension functions in a convenient way. Convenient is an important word here: it means that the set of extensions is not necessarily a minimal set from a functional point of view. Instead, the set is chosen to make the common cases easy. For example, the ReductionExpr could have been replaced by recursive functions, but proved useful in many of the example functions.

The extensions have also been selected with implementors in mind. It should be easy to extend the current implementations of XPath 1.0, with the constructs defined in this document.

FXPath extends XPath 1.0 much in the same way as [XPointer] does: New grammar productions are defined and then connected to XPath 1.0 by changing some of its original productions.

FXPath contains four extensions to XPath 1.0:

The extensions are defined in the chapters that follow.

2 Variables

VariableExpr  ::=  VariableBinding* OrExpr
VariableBinding  ::=  VarName ':=' Expr ';'

A variable binding defines a new variable. The variable is visible in subsequent variable bindings and in the following OrExpr. The variable is not visible for the expression specifying its value.

The value of a VariableExpr is the value of its OrExpr evaluated in the scope of the preceding variable bindings.

The name of a variable binding is specified by VarName

The value of a variable binding is the object that results from evaluating the expression between the := and ; tokens.

A binding shadows another binding if the binding occurs at a point where the other binding is visible, and the bindings have the same name.

It is an error if a variable binding shadows another variable binding established in the same FXPath expression.

It is also an error if the variable shadows a variable bound outside of the FXPath expression if it had been an error binding a variable, with the same name, at the same location where the FXPath expression appears. For an FXPath appearing in XSLT this means that only global variables can be shadowed.

Example:

$doc := document ('mydoc.xml');
$title := $doc/title;
translate ($title, '-', '/')

3 Conditional Expression

IfExpr  ::=  'if' Condition 'then' ThenExpr 'else' ElseExpr
Condition  ::=  '(' Expr ')'
ThenExpr  ::=  '(' Expr ')'
ElseExpr  ::=  '(' Expr ')'
| IfExpr

An if expression is evaluated by evaluating the condition and converting its value to a boolean as if by a call to the boolean function. If the condition is true, then the ThenExpr is evaluated and used as the result, otherwise the ElseExpr is evaluated and used as the result. Only one of the ThenExpr and ElseExpr expressions must be evaluated.

Example:

$title := document ('mydoc.xml')//meta/title;
if (function-available ('fx:to-upper'))
  then (fx:to-upper ($title))
  else (translate ($title, 'abcd...', 'ABCD...'))
Issue: if-syntax - Should the keywords in the conditional expression have a different syntactic form? For example:
#if $test
#then $a
#else $b

It makes the expression contain fewer characters, but it may also clutter the language.

4 Reduction Expression

ReductionExpr  ::=  'reduce' SourceNodeSetExpr 'into' Collector 'by' BodyExpr
SourceNodeSetExpr  ::=  '(' Expr ')'
Collector  ::=  '(' VarName ':=' InitExpr ')'
InitExpr  ::=  Expr
BodyExpr  ::=  '(' Expr ')'

The reduce expression provides a convenient way of writing many recursive functions. It also simplifies optimization.

A reduce expression is evaluated by:

The collector variable is visible in BodyExpr and contains the value of the previous evaluation of the body. If there was no previous evaluation i.e., the context node is the first node in the source node-set, then the collector variable contains the initial value.

The value of the reduce expression is the value of the last body evaluation, or the initial value if the source node-set was empty.

Example:

reduce (//person)
  into ($comma-separated-names := '')
    by (if (position () = 1)
          then (name)
          else concat ($comma-separated-names, ', ', name))
Issue: reduce-sort - Should sort primitives be added to the reduce construct? For example:
$largest-file :=
  reduce (//file)
    sort (size : data-type ('number') : direction ('descending'))
    into ($largest := /..)
      by (if (position () = 1)
            then (.)
            else ($largest))
;

Are there common enough use cases to motivate the added complexity?

5 Function Call Steps

FunctionCallStep  ::=  FunctionCall Predicate*

The node-set selected by a function call step is the node-set that results from generating an initial node-set by calling the function, and then filtering that node-set by each of the predicates in turn.

The predicates filter the node-set with respect to the child axis.

It is an error if the function call does not result in a node-set.

Example:

document ('people.xml')/key ('first-name', 'David') [1]/Address

6 Function Definition

FunctionDefinition  ::=  'function' '(' ( Parameter ( ',' Parameter )* )? ')' '->' FuntionBody
Parameter  ::=  VariableReference
| VarName ':=' DefaultExpr
DefaultExpr  ::=  Expr
FunctionBody  ::=  Expr

The result of evaluating a FunctionDefinition is a function object. Although a function definition is regarded as an expression that can be evaluated, it cannot occur inside an FXPath expression. Furthermore, there are no functions or primitives in FXPath that can operate on function objects directly. This could change in future versions of FXPath. However, allowing function definitions to occur freely in FXPath expression adds a great deal of undesired complexity to FXPath as compared to XPath 1.0.

A function object is a new value type. It has the following characteristics: A function object has

The max-args property is the same as the number of Parameter in the parameter list.

When the invoke method is called, the expression specified in FunctionBody is evaluated to yield the result value. The variables visible for the function body are the variables visible at the point where the FunctionDefinition occurs, and the variables bound by the parameter list of the function.

When the body is evaluated, the value of a variable bound by a Parameter is:

The default value of a parameter is either an empty string, if the Parameter was specified through a VariableReference, or the value that results from evaluating the DefaultExpr, if it was specified.

NOTE: The use of the token VariableReference in this context is purely syntactic. In fact, it defines rather than references a variable.

The variable bound by a Parameter is visible for the subsequent parameter definitions in the same parameter list, and for the function body. The variable is not visible in a possible DefaultExpr of the same Parameter.

It is an error if a variable bound by a Parameter shadows another variable bound by a preceding parameter.

It is also an error if a parameter shadows a variable bound outside of the function definition, if it had been an error binding a variable, with the same name, at the same point where the function definition appears.

Example (a to-upper function):

function ($string) ->
  translate ($string, 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')

7 Supersetting XPath 1.0

The FXPath grammar is constructed as a superset of the XPath 1.0 grammar in the following way:

Change the Expr production to be:

Expr  ::=  VariableExpr

Append ReductionExpr and IfExpr to the PrimaryExpr production:

PrimaryExpr  ::=  VariableReference
| '(' Expr ')'
| Literal
| Number
| FunctionCall
| ReductionExpr
| IfExpr

Insert FunctionCallStep by changing the Step production to be:

Step  ::=  SimpleStep
| FunctionCallStep
SimpleStep  ::=  AxisSpecifier NodeTest Predicate*
| AbbreviatedStep

Change the RelativeLocationPath production to be:

RelativeLocationPath  ::=  SimpleStep
| RelativeLocationPath '/' Step
| AbbreviatedRelativeLocationPath

The extra SimpleStep production is needed to resolve the ambiguity that would otherwise occur between FilterExpr and FunctionCallStep if only Step was used.

7.1 Lexical Structure

Add the ';', '->', ':=' and VarName tokens to the ExprToken Production:

ExprToken  ::=  '(' | ')' | '[' | ']' | '.' | '..' | '@' | ',' | '::' | ':=' | ';' | '->'
| NameTest
| NodeType
| Operator
| FunctionName
| AxisName
| Literal
| Number
| VariableReference
| VarName
| SpecialForm
VarName  ::=  '$' QName
SpecialForm  ::=  'if'
| 'then'
| 'else'
| 'reduce'
| 'into'
| 'by'
| 'function'

Prepend the following rules to the [special tokenization rules]:

8 Defining FXPath Functions in XSLT

This document defines two elements for implementing extension functions in XSLT: fx:template-function and fx:define. The fx:template-function element defines extension functions by XSLT 1.0 instructions. The fx:define element uses FXPath syntax for the function.

8.1 Namespace

The namespace used for the function definition elements is:

http://www.pantor.com/fxpath

The prefix fx: is used in this document to specify this namespace.

The prefix xsl: is used for elements belonging to the XSLT 1.0 namespace: http://www.w3.org/1999/XSL/Transform.

8.2 Function Definition by fx:define

<fx:define name = qname>
  <!-- Content: #PCDATA -->
</fx:define>

The element fx:define defines an extension function implemented with FXPath. The fx:define element can only occur at the top level of an XSLT stylesheet.

An fx:define element must have a name attribute, indicating the name of the function. The value of the name attribute is a QName, which is expanded as described in [2.4 Qualified Names] in the XSLT 1.0 Recommendation. It is an error if the namespace URI of the expanded name of the function is null - extension functions must not be in a null namespace.

The content of an fx:define element must be character data only, and must be an FXPath function definition, i.e., it must match the FunctionDefinition production in the grammar.

NOTE: The content of an fx:define element is not an XSLT template. It is an error if the content of an fx:define contains element nodes.

The fx:define element associates the function name with the function object that results from evaluating the FunctionDefinition specified in the content.

The result of calling an extension function defined by fx:define, is the result of calling the invoke method on its associated function object with the same arguments.

It is an error if the number of arguments is greater than the max-args property of the function object. An implementation may signal the error; if it doesn't, then it must recover by ignoring the extra arguments.

Example:

<fx:define name="fx:replace">
  function ($s, $old, $new) ->
    $head := substring-before ($s, $old);
    $tail := substring-after ($s, $old);
    if ($head)
      then (concat ($head, $new, fx:replace ($tail, $old, $new)))
    else if ($tail)
      then (concat ($new, fx:replace ($tail, $old, $new)))
    else if ($s = $old)
      then ($new)
    else
      ($s)
</fx:define>

8.3 Function Definition by fx:template-function

<fx:template-function
     name = qname 
     result-type = "number" | "string" | "node-set">
  <!-- Content: (xsl:param*, template) -->
</fx:template-function>

The element fx:template-function defines an extension function implemented with XSLT 1.0 instructions.

An fx:template-function element must have a name attribute, indicating the name of the function. The value of the name attribute is a QName, which is expanded as described in [2.4 Qualified Names] in the XSLT 1.0 Recommendation. It is an error if the namespace URI of the expanded name of the function is null - extension functions must not be in a null namespace.

The content and semantics of a fx:template-function is the same as a named xsl:template with the following exceptions:

  • The function is called through function calls in XPath or FXPath rather than through the xsl:call-template element.
  • Arguments are passed by position. The first argument is assigned to the first parameter, the second to the second and so on.
  • The result of instantiating the template is a result tree fragment as defined in [11.1 Result Tree Fragments].
  • The result tree fragment may optionally be converted into one of three basic XPath types.

A call to an fx:template-function may contain fewer arguments than defined. In this case, the default values are used for the trailing parameters. However, it is an error if too many arguments are passed. An implementation may signal the error; if it doesn't, then it must recover by ignoring the extra arguments.

When calling a function defined by an fx:template-function element, its template is instantiated to give a result tree fragment. The result tree fragment is equivalent to a node-set containing just a single root node having as children the sequence of nodes produced by instantiating the template. The base URI of the nodes in the result tree fragment is the base URI of the fx:template-function element.

It is an error if a member of the sequence of nodes created by instantiating the template is an attribute node or a namespace node, since a root node cannot have an attribute node or a namespace node as a child. An XSLT processor may signal the error; if it does not signal the error, it must recover by not adding the attribute node or namespace node.

The result tree fragment is used as the result value of the function unless the result-type attribute is specified, in which case the result tree fragment is converted to the specified type as follows:

  • string - the value is converted to a string as if by a call to the string function
  • number - the value is converted to a number as if by a call to the number function
  • node-set - the value is converted to its equivalent node set (containing a singel root node).
NOTE: There is no result value conversion to boolean since it would always yeild true.

9 Examples

The example functions (apart from the message functions) come from [B. Sample Extension Functions] in the EXSL spec.

9.1 Common Extension Functions

Function: object com:if (boolean, object, object)

<fx:define name="com:if">
  function ($test, $true, $false) ->
    if ($test)
      then ($true)
      else ($false)
</fx:define>

Function: RTF com:eval (node-set, string?)

<fx:template-function name="com:eval">
  <xsl:param name="node-set" select="/.."/>
  <xsl:param name="expr" select="'.'"/>
  <xsl:apply-templates select="$node-set [1]" mode="com:eval">
    <xsl:with-param name="expr" select="$expr"/>
  </xsl:apply-templates>
</fx:template-function>

9.2 Message Functions

The message functions allow the functionality of xsl:message to be used in XPath expressions.

Example:

fx:message (concat ('Variable foo is: ', $foo), $foo)

Function: object fx:message (string, object, boolean?)

<fx:define name="fx:message">
  function ($message := fx:required ('fx:message/message'),
            $value := fx:required ('fx:message/value'),
            $terminate := false ()) ->
    if (string (fx:display-message ($message, $terminate)))
      then ($value)
      else ($value)
</fx:define>

Function: RTF fx:display-message (string, boolean?)

<fx:template-function name="fx:display-message">
  <xsl:param name="message"/>
  <xsl:param name="terminate" select="false ()"/>
  <xsl:choose>
    <xsl:when test="$terminate">
      <xsl:message terminate="yes">
        <xsl:value-of select="$message"/>
      </xsl:message>
    </xsl:when>
    <xsl:otherwise>
      <xsl:message>
        <xsl:value-of select="$message"/>
      </xsl:message>
    </xsl:otherwise>
  </xsl:choose>
</fx:template-function>

Function: object fx:required (string)

<fx:define name="fx:required">
  function ($param := fx:required ('fx:required/param')) ->
    fx:exception (concat ('Required parameter: ', $param))
</fx:define>

Function: object fx:exception (string?)

<fx:define name="fx:exception">
  function ($message) ->
    fx:message (concat ('ERROR: ', $message), '', true ())
</fx:define>

9.3 Set Functions

Function: node-set set:difference (node-set, node-set)

<fx:define name="set:difference">
  function ($node-set1 := /.., $node-set2 := /..) ->
    $node-set1 [count (.|$node-set2) != count ($node-set2)]
</fx:define>

Function: node-set set:has-same-node (node-set, node-set)

<fx:define name="set:has-same-node">
  function ($node-set1 := /.., $node-set2 := /..) ->
    boolean ($node-set1 [count (.|$node-set2) = count ($node-set2)])
</fx:define>

Function: node-set set:intersection (node-set, node-set)

<fx:define name="set:intersection">
  function ($node-set1 := /.., $node-set2 := /..) ->
    $node-set1 [count (.|$node-set2) = count ($node-set2)]
</fx:define>

Function: node-set set:distinct (node-set, string?)

<fx:define name="set:distinct">
  function ($node-set := /.., $expr := '.') ->
    reduce ($node-set)
      into ($distinct := /..)
        by ($node-value := string (com:eval (., $expr));
            if ($distinct [string (com:eval (., $expr)) = $node-value])
              then ($distinct)
              else ($distinct | .))
</fx:define>

Function: node-set set:leading (node-set, string, string?)

<fx:define name="set:leading">
  function ($node-set := /.., $value, $expr := '.') ->
    $first := $node-set [1];
    if (not ($first) or string (com:eval ($first, $expr)) = string ($value))
      then (/..)
      else ($first |
            set:leading ($node-set [position () != 1], $value, $expr))
</fx:define>

Function: node-set set:following (node-set, string, string?)

<fx:define name="set:following">
  function ($node-set := /.., $value, $expr := '.') ->
    $first := $node-set [1];
    if (not ($node-set))
      then (/..)
    else if (string (com:eval ($first, $expr)) = string ($value))
      then ($node-set)
    else
      (set:following ($node-set [position () != 1], $value, $expr))
</fx:define>

Function: boolean set:exists (node-set, string?)

<fx:define name="set:exists">
  function ($node-set := /.., $expr := '.') ->
    $node-set and
      ($value := string (com:eval ($node-set [1], $expr));
       $value and $value != 'false'
       or
       set:exists ($node-set [position () != 1], $expr))
</fx:define>

Function: boolean set:for-all (node-set, string?)

<fx:define name="set:for-all">
  function ($node-set := /.., $expr := '.') ->
    not ($node-set) or
      ($value := string (com:eval ($node-set [1], $expr));
       $value and $value != 'false'
       and
       set:for-all ($node-set [position () != 1], $expr))
</fx:define>

9.4 Numerical Functions

Function: number num:max (node-set, string?)

<fx:define name="num:max">
  function ($node-set := /.., $expr := '.') ->
    reduce ($node-set)
      into ($max := number (com:eval ($node-set [1], $expr)))
        by ($value := number (com:eval (., $expr));
            if ($value > $max)
              then ($value)
              else ($max))
</fx:define>

Function: number num:min (node-set, string?)

<fx:define name="num:min">
  function ($node-set := /.., $expr := '.') ->
    reduce ($node-set)
      into ($min := number (com:eval ($node-set [1], $expr)))
        by ($value := number (com:eval (., $expr));
            if ($min > $value)
              then ($value)
              else ($min))
</fx:define>

Function: node-set num:highest (node-set, string?)

<fx:define name="num:highest">
  function ($node-set := /.., $expr := '.') ->
    reduce ($node-set)
      into ($highest := $node-set [1])
        by (if (number (com:eval (., $expr)) > number (com:eval ($highest, $expr)))
              then (.)
              else ($highest))
</fx:define>

Function: node-set num:lowest (node-set, string?)

<fx:define name="num:lowest">
  function ($node-set := /.., $expr := '.') ->
    reduce ($node-set)
      into ($lowest := $node-set [1])
        by (if (number (com:eval ($lowest, $expr)) > number (com:eval (., $expr)))
              then (.)
              else ($lowest))
</fx:define>

Function: number num:sum (node-set, string?)

<fx:define name="num:sum">
  function ($node-set := /.., $expr := '.') ->
    reduce ($node-set)
      into ($sum := 0)
        by ($sum + com:eval (., $expr))
</fx:define>

9.5 Generative Functions

Function: node-set gen:make-node (object)

<fx:template-function name="gen:make-node" result-type="node-set">
  <xsl:param name="value"/>
  <node>
    <xsl:copy-of select="$value"/>
  </node>
</fx:template-function>

Function: node-set gen:append (node-set, node-set)

<fx:template-function name="gen:append" result-type="node-set">
  <xsl:param name="node-set1"/>
  <xsl:param name="node-set2"/>
  <xsl:copy-of select="$node-set1"/>
  <xsl:copy-of select="$node-set2"/>
</fx:template-function>

Function: node-set gen:range (number, number, node-set)

<fx:define name="gen:range">
  function ($start := 0, $end := 0, $range := /..) ->
    if (number ($start) > number ($end))
      then ($range/*)
      else (gen:range ($start + 1, $end, gen:append ($range, gen:make-node ($start))))
</fx:define>

Function: string gen:padding (number, string?)

<fx:define name="gen:padding">
  function ($repeat := 1, $string := ' ') ->
    reduce (gen:range (1, $repeat))
      into ($result := '')
        by (concat ($result, $string))
</fx:define>

Function: node-set gen:tokens (string, string?)

<fx:define name="gen:tokens">
  function ($string, $delimiters := ' ') ->
    $del := substring ($delimiters, 1, 1);
    $del-length := string-length ($delimiters);
    $value :=
      if ($del-length > 1)
        then ($replacement := gen:padding ($del-length - 1, $del);
              translate ($string,
                         substring ($delimiters, 2),
                         $replacement))
        else ($string);
    gen-helper:tokens ($value, $del)
</fx:define>

Function: node-set gen-helper:tokens (string, string, node-set)

<fx:define name="gen-helper:tokens">
  function ($string, $delimiter, $tokens := /..) ->
    if (contains ($string, $delimiter))
      then ($token := substring-before ($string, $delimiter);
            gen-helper:tokens (substring-after ($string, $delimiter),
                               $delimiter,
                               gen:append ($tokens, gen:make-node ($token))))
      else ($tokens/*)
</fx:define>

9.6 Sorting Functions

Function: number sort:position (node-set, string?, string?, string?)

<fx:template-function name="sort:position" result-type="number">
  <xsl:param name="node-set" select="/.."/>
  <xsl:param name="order" select="'ascending'"/>
  <xsl:param name="data-type" select="'text'"/>
  <xsl:param name="expr" select="'.'"/>
  <xsl:variable name="current" select="."/>
  <xsl:for-each select="$node-set">
    <xsl:sort select="com:eval (., $expr)"/>
    <xsl:if test="count (.| $current) = 1">
      <xsl:value-of select="position ()"/>
    </xsl:if>
  </xsl:for-each>
</fx:template-function>

9.7 Other Document Functions

Function: node-set doc:key (string, object, object, node-set?)

<fx:define name="doc:key">
  function ($key-name, $key-value, $documents, $base-URI := document ('')) ->
    document ($documents, $base-URI)/key ($key-name, $key-value)
</fx:define>

Function: node-set doc:id (object, object, node-set?)

<fx:define name="doc:id">
  function ($id, $documents, $base-URI := document ('')) ->
    document ($documents, $base-URI)/id ($id)
</fx:define>

10 Changes

11 Acknowledgements

For inspiring ideas, challenges and helpful comments on the XSL List:

Also, ideas for this document have been taken from [XPath Requirements Version 2.0], [XQuery], and many other W3C documents.