FXPath - Functional XPath |
||||||||
|
This document specifies a language, FXPath, for creating user-defined extension functions in XSLT stylesheets.
This document started off as a comment to another specification, [User-Defined Extension Functions in XSLT] (EXSL for short) written by Jeni Tennison.
The EXSL document is in large an excellent compilation and presentation of the ideas and issues recently discussed in a thread on the XSL list (xsl-list@lists.mulberrytech.com). However, the EXSL document presents one of two rather different approaches on how to implement the extension functions. This document tries to present the other.
The EXSL approach is to retrofit some XSLT instructions so that they can deal with all types in XPath. Notably node-sets. This document wants to show that there is a more natural way to accomplish the same result: write extension functions in XPath to deal with XPath types. Since XPath 1.0 lacks some vital constructs to do this, this document presents a superset, called Functional XPath (FXPath), that makes this possible in a convenient way.
This document has a much narrower scope than the EXSL specification. It is concentrated around how to actually define the extension functions. The issues on calling functions, defining sets of common extension functions etc are well covered in EXSL and are not handled here. However, the set of example functions are reimplemented here, in FXPath, to enable a side-by-side comparison.
The purpose of this specification is to define a language, FXPath - Functional XPath, suitable for implementing extension functions in [XSLT 1.0]. The approach is to define a small set of augmentations to [XPath 1.0] and a way of specifying FXPath functions inside an XSLT stylesheet.
Even though most of the constructs defined in this document have counterparts in [XQuery] or [XPath 2.0], FXPath is not fully replaceable by either one of them. And FXPath doesn't try to replace them.
FXPath isn't expressed in terms of XQuery mainly for two reasons:
FXPath isn't expressed in terms of XPath 2.0 mainly because XPath 2.0 is too large for the scope of FXPath, and because it is still just a set of requirements, that is, no concrete syntax exists to build on.
FXPath is described in terms of XPath 1.0. The goal is to identify a small set of extensions that will enable users to write extension functions in a convenient way. Convenient is an important word here: it means that the set of extensions is not necessarily a minimal set from a functional point of view. Instead, the set is chosen to make the common cases easy. For example, the ReductionExpr could have been replaced by recursive functions, but proved useful in many of the example functions.
The extensions have also been selected with implementors in mind. It should be easy to extend the current implementations of XPath 1.0, with the constructs defined in this document.
FXPath extends XPath 1.0 much in the same way as [XPointer] does: New grammar productions are defined and then connected to XPath 1.0 by changing some of its original productions.
FXPath contains four extensions to XPath 1.0:
The extensions are defined in the chapters that follow.
| VariableExpr | ::= | VariableBinding* OrExpr |
| VariableBinding | ::= | VarName ':=' Expr ';' |
A variable binding defines a new variable. The variable is visible in subsequent variable bindings and in the following OrExpr. The variable is not visible for the expression specifying its value.
The value of a VariableExpr is the value of its OrExpr evaluated in the scope of the preceding variable bindings.
The name of a variable binding is specified by VarName
The value of a variable binding is the object that results from
evaluating the expression between the := and
; tokens.
A binding shadows another binding if the binding occurs at a point where the other binding is visible, and the bindings have the same name.
It is an error if a variable binding shadows another variable binding established in the same FXPath expression.
It is also an error if the variable shadows a variable bound outside of the FXPath expression if it had been an error binding a variable, with the same name, at the same location where the FXPath expression appears. For an FXPath appearing in XSLT this means that only global variables can be shadowed.
Example:
$doc := document ('mydoc.xml');
$title := $doc/title;
translate ($title, '-', '/')
| IfExpr | ::= | 'if' Condition 'then' ThenExpr 'else' ElseExpr |
| Condition | ::= | '(' Expr ')' |
| ThenExpr | ::= | '(' Expr ')' |
| ElseExpr | ::= | '(' Expr ')' |
| | IfExpr | ||
An if expression is evaluated by evaluating the
condition and converting its value to a boolean as if by a call to the
boolean
function. If the condition is true, then the ThenExpr
is evaluated and used as the result, otherwise the
ElseExpr is evaluated and used as the result.
Only one of the ThenExpr and ElseExpr
expressions must be evaluated.
Example:
$title := document ('mydoc.xml')//meta/title;
if (function-available ('fx:to-upper'))
then (fx:to-upper ($title))
else (translate ($title, 'abcd...', 'ABCD...'))
#if $test #then $a #else $b
It makes the expression contain fewer characters, but it may also clutter the language.
| ReductionExpr | ::= | 'reduce' SourceNodeSetExpr 'into' Collector 'by' BodyExpr |
| SourceNodeSetExpr | ::= | '(' Expr ')' |
| Collector | ::= | '(' VarName ':=' InitExpr ')' |
| InitExpr | ::= | Expr |
| BodyExpr | ::= | '(' Expr ')' |
The reduce expression provides a convenient way of
writing many recursive functions. It also simplifies optimization.
A reduce expression is evaluated by:
The collector variable is visible in BodyExpr and contains the value of the previous evaluation of the body. If there was no previous evaluation i.e., the context node is the first node in the source node-set, then the collector variable contains the initial value.
The value of the reduce expression is the value of
the last body evaluation, or the initial value if the
source node-set was empty.
Example:
reduce (//person)
into ($comma-separated-names := '')
by (if (position () = 1)
then (name)
else concat ($comma-separated-names, ', ', name))
$largest-file :=
reduce (//file)
sort (size : data-type ('number') : direction ('descending'))
into ($largest := /..)
by (if (position () = 1)
then (.)
else ($largest))
;
Are there common enough use cases to motivate the added complexity?
| FunctionCallStep | ::= | FunctionCall Predicate* |
The node-set selected by a function call step is the node-set that results from generating an initial node-set by calling the function, and then filtering that node-set by each of the predicates in turn.
The predicates filter the node-set with respect to the child axis.
It is an error if the function call does not result in a node-set.
Example:
document ('people.xml')/key ('first-name', 'David') [1]/Address
| FunctionDefinition | ::= | 'function' '(' ( Parameter ( ',' Parameter )* )? ')' '->' FuntionBody |
| Parameter | ::= | VariableReference |
| | VarName ':=' DefaultExpr | ||
| DefaultExpr | ::= | Expr |
| FunctionBody | ::= | Expr |
The result of evaluating a FunctionDefinition is a function object. Although a function definition is regarded as an expression that can be evaluated, it cannot occur inside an FXPath expression. Furthermore, there are no functions or primitives in FXPath that can operate on function objects directly. This could change in future versions of FXPath. However, allowing function definitions to occur freely in FXPath expression adds a great deal of undesired complexity to FXPath as compared to XPath 1.0.
A function object is a new value type. It has the following characteristics: A function object has
max-args, indicating the maximum number
of arguments, andinvoke, that can be called with up to
max-args argument values.The max-args property is the same as the number of
Parameter in the parameter list.
When the invoke method is called, the expression
specified in FunctionBody is evaluated to yield the
result value. The variables visible for the function body are the
variables visible at the point where the FunctionDefinition
occurs, and the variables bound by the parameter list of the
function.
When the body is evaluated, the value of a variable bound by a Parameter is:
invoke
method, orThe default value of a parameter is either an empty string, if the Parameter was specified through a VariableReference, or the value that results from evaluating the DefaultExpr, if it was specified.
The variable bound by a Parameter is visible for the subsequent parameter definitions in the same parameter list, and for the function body. The variable is not visible in a possible DefaultExpr of the same Parameter.
It is an error if a variable bound by a Parameter shadows another variable bound by a preceding parameter.
It is also an error if a parameter shadows a variable bound outside of the function definition, if it had been an error binding a variable, with the same name, at the same point where the function definition appears.
Example (a to-upper function):
function ($string) -> translate ($string, 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
The FXPath grammar is constructed as a superset of the XPath 1.0 grammar in the following way:
Change the Expr production to be:
| Expr | ::= | VariableExpr |
Append ReductionExpr and IfExpr to the PrimaryExpr production:
| PrimaryExpr | ::= | VariableReference |
| | '(' Expr ')' | ||
| | Literal | ||
| | Number | ||
| | FunctionCall | ||
| | ReductionExpr | ||
| | IfExpr | ||
Insert FunctionCallStep by changing the Step production to be:
| Step | ::= | SimpleStep |
| | FunctionCallStep | ||
| SimpleStep | ::= | AxisSpecifier NodeTest Predicate* |
| | AbbreviatedStep | ||
Change the RelativeLocationPath production to be:
| RelativeLocationPath | ::= | SimpleStep |
| | RelativeLocationPath '/' Step | ||
| | AbbreviatedRelativeLocationPath | ||
The extra SimpleStep production is needed to resolve the ambiguity that would otherwise occur between FilterExpr and FunctionCallStep if only Step was used.
Add the ';', '->', ':=' and VarName tokens to the ExprToken Production:
| ExprToken | ::= | '(' | ')' | '[' | ']' | '.' | '..' | '@' | ',' | '::' | ':=' | ';' | '->' |
| | NameTest | ||
| | NodeType | ||
| | Operator | ||
| | FunctionName | ||
| | AxisName | ||
| | Literal | ||
| | Number | ||
| | VariableReference | ||
| | VarName | ||
| | SpecialForm | ||
| VarName | ::= | '$' QName |
| SpecialForm | ::= | 'if' |
| | 'then' | ||
| | 'else' | ||
| | 'reduce' | ||
| | 'into' | ||
| | 'by' | ||
| | 'function' | ||
Prepend the following rules to the [special tokenization rules]:
$ QName
(possibly after intervening
ExprWhitespace) are
:=, then the token must be recognized as
a VarName. Otherwise, the token must not be
recognized as a VarName.
if, then the token must be recognized as
a SpecialForm(, then the token must be recognized as either
a SpecialForm, a
NodeType,
or a FunctionCall.
This document defines two elements for implementing extension
functions in XSLT: fx:template-function
and fx:define. The fx:template-function
element defines extension functions by XSLT 1.0 instructions.
The fx:define element uses FXPath syntax for the function.
The namespace used for the function definition elements is:
http://www.pantor.com/fxpath
The prefix fx: is used in this document
to specify this namespace.
The prefix xsl: is used for elements belonging
to the XSLT 1.0 namespace: http://www.w3.org/1999/XSL/Transform.
<fx:define name = qname> <!-- Content: #PCDATA --> </fx:define>
The element fx:define defines an extension function
implemented with FXPath. The fx:define element can
only occur at the top level of an XSLT stylesheet.
An fx:define element must have
a name attribute, indicating
the name of the function. The value of the name attribute is
a QName, which is expanded as
described in [2.4 Qualified
Names]
in the XSLT 1.0 Recommendation.
It is an error if the namespace URI of the expanded name of the
function is null - extension functions must not be in a
null namespace.
The content of an fx:define element must be
character data only, and must be an FXPath function
definition, i.e.,
it must match the FunctionDefinition production in the
grammar.
fx:define element is not an
XSLT template. It is an error if the content of an
fx:define contains element nodes.
The fx:define element associates the function
name with the function object that results from evaluating the
FunctionDefinition specified in the content.
The result of calling an extension function defined by
fx:define, is the result of calling the
invoke method on its associated function object
with the same arguments.
It is an error if the number of arguments
is greater than the max-args property of the function
object. An implementation may signal the error; if it doesn't, then
it must recover by ignoring the extra arguments.
Example:
<fx:define name="fx:replace"> function ($s, $old, $new) -> $head := substring-before ($s, $old); $tail := substring-after ($s, $old); if ($head) then (concat ($head, $new, fx:replace ($tail, $old, $new))) else if ($tail) then (concat ($new, fx:replace ($tail, $old, $new))) else if ($s = $old) then ($new) else ($s) </fx:define>
<fx:template-function
name = qname
result-type = "number" | "string" | "node-set">
<!-- Content: (xsl:param*, template) -->
</fx:template-function>
The element fx:template-function defines an
extension function implemented with XSLT 1.0 instructions.
An fx:template-function element must have
a name attribute, indicating
the name of the function. The value of the name attribute is
a QName, which is expanded as
described in [2.4 Qualified
Names]
in the XSLT 1.0 Recommendation.
It is an error if the namespace URI of the expanded name of the
function is null - extension functions must not be in a
null namespace.
The content and semantics of a fx:template-function is
the same as a named xsl:template with the following
exceptions:
xsl:call-template element.A call to an fx:template-function may contain fewer
arguments than defined. In this case, the default values are used
for the trailing parameters. However, it is an error if
too many arguments are passed. An implementation
may signal the error; if it doesn't, then it must recover
by ignoring the extra arguments.
When calling a function defined by an
fx:template-function element, its template is
instantiated to give a result tree fragment. The result tree
fragment is equivalent to a node-set
containing just a single root node having as children the
sequence of nodes produced by
instantiating the template. The base URI of the nodes in the result
tree fragment is the base URI of the fx:template-function
element.
It is an error if a member of the sequence of nodes created by instantiating the template is an attribute node or a namespace node, since a root node cannot have an attribute node or a namespace node as a child. An XSLT processor may signal the error; if it does not signal the error, it must recover by not adding the attribute node or namespace node.
The result tree fragment is used as the result value of the
function unless the result-type attribute is specified,
in which case the result tree fragment is converted to the specified
type as follows:
string functionnumber functionThe example functions (apart from the message functions) come from [B. Sample Extension Functions] in the EXSL spec.
<fx:define name="com:if"> function ($test, $true, $false) -> if ($test) then ($true) else ($false) </fx:define>
<fx:template-function name="com:eval"> <xsl:param name="node-set" select="/.."/> <xsl:param name="expr" select="'.'"/> <xsl:apply-templates select="$node-set [1]" mode="com:eval"> <xsl:with-param name="expr" select="$expr"/> </xsl:apply-templates> </fx:template-function>
The message functions allow the functionality
of xsl:message to be used in XPath expressions.
Example:
fx:message (concat ('Variable foo is: ', $foo), $foo)
<fx:define name="fx:message"> function ($message := fx:required ('fx:message/message'), $value := fx:required ('fx:message/value'), $terminate := false ()) -> if (string (fx:display-message ($message, $terminate))) then ($value) else ($value) </fx:define>
<fx:template-function name="fx:display-message"> <xsl:param name="message"/> <xsl:param name="terminate" select="false ()"/> <xsl:choose> <xsl:when test="$terminate"> <xsl:message terminate="yes"> <xsl:value-of select="$message"/> </xsl:message> </xsl:when> <xsl:otherwise> <xsl:message> <xsl:value-of select="$message"/> </xsl:message> </xsl:otherwise> </xsl:choose> </fx:template-function>
<fx:define name="fx:required"> function ($param := fx:required ('fx:required/param')) -> fx:exception (concat ('Required parameter: ', $param)) </fx:define>
<fx:define name="fx:exception"> function ($message) -> fx:message (concat ('ERROR: ', $message), '', true ()) </fx:define>
<fx:define name="set:difference"> function ($node-set1 := /.., $node-set2 := /..) -> $node-set1 [count (.|$node-set2) != count ($node-set2)] </fx:define>
<fx:define name="set:has-same-node"> function ($node-set1 := /.., $node-set2 := /..) -> boolean ($node-set1 [count (.|$node-set2) = count ($node-set2)]) </fx:define>
<fx:define name="set:intersection"> function ($node-set1 := /.., $node-set2 := /..) -> $node-set1 [count (.|$node-set2) = count ($node-set2)] </fx:define>
<fx:define name="set:distinct"> function ($node-set := /.., $expr := '.') -> reduce ($node-set) into ($distinct := /..) by ($node-value := string (com:eval (., $expr)); if ($distinct [string (com:eval (., $expr)) = $node-value]) then ($distinct) else ($distinct | .)) </fx:define>
<fx:define name="set:leading"> function ($node-set := /.., $value, $expr := '.') -> $first := $node-set [1]; if (not ($first) or string (com:eval ($first, $expr)) = string ($value)) then (/..) else ($first | set:leading ($node-set [position () != 1], $value, $expr)) </fx:define>
<fx:define name="set:following"> function ($node-set := /.., $value, $expr := '.') -> $first := $node-set [1]; if (not ($node-set)) then (/..) else if (string (com:eval ($first, $expr)) = string ($value)) then ($node-set) else (set:following ($node-set [position () != 1], $value, $expr)) </fx:define>
<fx:define name="set:exists"> function ($node-set := /.., $expr := '.') -> $node-set and ($value := string (com:eval ($node-set [1], $expr)); $value and $value != 'false' or set:exists ($node-set [position () != 1], $expr)) </fx:define>
<fx:define name="set:for-all"> function ($node-set := /.., $expr := '.') -> not ($node-set) or ($value := string (com:eval ($node-set [1], $expr)); $value and $value != 'false' and set:for-all ($node-set [position () != 1], $expr)) </fx:define>
<fx:define name="num:max"> function ($node-set := /.., $expr := '.') -> reduce ($node-set) into ($max := number (com:eval ($node-set [1], $expr))) by ($value := number (com:eval (., $expr)); if ($value > $max) then ($value) else ($max)) </fx:define>
<fx:define name="num:min"> function ($node-set := /.., $expr := '.') -> reduce ($node-set) into ($min := number (com:eval ($node-set [1], $expr))) by ($value := number (com:eval (., $expr)); if ($min > $value) then ($value) else ($min)) </fx:define>
<fx:define name="num:highest"> function ($node-set := /.., $expr := '.') -> reduce ($node-set) into ($highest := $node-set [1]) by (if (number (com:eval (., $expr)) > number (com:eval ($highest, $expr))) then (.) else ($highest)) </fx:define>
<fx:define name="num:lowest"> function ($node-set := /.., $expr := '.') -> reduce ($node-set) into ($lowest := $node-set [1]) by (if (number (com:eval ($lowest, $expr)) > number (com:eval (., $expr))) then (.) else ($lowest)) </fx:define>
<fx:define name="num:sum"> function ($node-set := /.., $expr := '.') -> reduce ($node-set) into ($sum := 0) by ($sum + com:eval (., $expr)) </fx:define>
<fx:template-function name="gen:make-node" result-type="node-set"> <xsl:param name="value"/> <node> <xsl:copy-of select="$value"/> </node> </fx:template-function>
<fx:template-function name="gen:append" result-type="node-set"> <xsl:param name="node-set1"/> <xsl:param name="node-set2"/> <xsl:copy-of select="$node-set1"/> <xsl:copy-of select="$node-set2"/> </fx:template-function>
<fx:define name="gen:range"> function ($start := 0, $end := 0, $range := /..) -> if (number ($start) > number ($end)) then ($range/*) else (gen:range ($start + 1, $end, gen:append ($range, gen:make-node ($start)))) </fx:define>
<fx:define name="gen:padding"> function ($repeat := 1, $string := ' ') -> reduce (gen:range (1, $repeat)) into ($result := '') by (concat ($result, $string)) </fx:define>
<fx:define name="gen:tokens"> function ($string, $delimiters := ' ') -> $del := substring ($delimiters, 1, 1); $del-length := string-length ($delimiters); $value := if ($del-length > 1) then ($replacement := gen:padding ($del-length - 1, $del); translate ($string, substring ($delimiters, 2), $replacement)) else ($string); gen-helper:tokens ($value, $del) </fx:define>
<fx:define name="gen-helper:tokens"> function ($string, $delimiter, $tokens := /..) -> if (contains ($string, $delimiter)) then ($token := substring-before ($string, $delimiter); gen-helper:tokens (substring-after ($string, $delimiter), $delimiter, gen:append ($tokens, gen:make-node ($token)))) else ($tokens/*) </fx:define>
<fx:template-function name="sort:position" result-type="number"> <xsl:param name="node-set" select="/.."/> <xsl:param name="order" select="'ascending'"/> <xsl:param name="data-type" select="'text'"/> <xsl:param name="expr" select="'.'"/> <xsl:variable name="current" select="."/> <xsl:for-each select="$node-set"> <xsl:sort select="com:eval (., $expr)"/> <xsl:if test="count (.| $current) = 1"> <xsl:value-of select="position ()"/> </xsl:if> </xsl:for-each> </fx:template-function>
<fx:define name="doc:key"> function ($key-name, $key-value, $documents, $base-URI := document ('')) -> document ($documents, $base-URI)/key ($key-name, $key-value) </fx:define>
<fx:define name="doc:id"> function ($id, $documents, $base-URI := document ('')) -> document ($documents, $base-URI)/id ($id) </fx:define>
fx:function into
two elements: fx:define and
fx:template-function. The purpose of the split is
to clearly separate XSLT instructions and FXPath expressions.
The positive effect is that the separation makes the definition
of each element clearer, both from a syntactic and semantic point
of view. The slight drawback is that functions using both FXPath and
XSLT instructions must now be split.For inspiring ideas, challenges and helpful comments on the XSL List:
Also, ideas for this document have been taken from [XPath Requirements Version 2.0], [XQuery], and many other W3C documents.