Joins, which combine data from multiple sources into a single result, are a very important type of query. In this section we will illustrate how several types of joins can be expressed in XQuery. We will base our examples on the following three documents:
A document named parts.xml
that contains many
part
elements; each part
element in turn contains
partno
and description
subelements.
A document named suppliers.xml
that contains many
supplier
elements; each supplier
element in turn
contains suppno
and suppname
subelements.
A document named catalog.xml
that contains information about
the relationships between suppliers and parts. The catalog document contains
many item
elements, each of which in turn contains
partno
, suppno
, and price
subelements.
A conventional ("inner") join returns information from two or more related sources, as illustrated by the following example, which combines information from three documents. The example generates a "descriptive catalog" derived from the catalog document, but containing part descriptions instead of part numbers and supplier names instead of supplier numbers. The new catalog is ordered alphabetically by part description and secondarily by supplier name.
<descriptive-catalog> { for $i in fn:doc("catalog.xml")//item, $p in fn:doc("parts.xml")//part[partno = $i/partno], $s in fn:doc("suppliers.xml")//supplier[suppno = $i/suppno] order by $p/description, $s/suppname return <item> { $p/description, $s/suppname, $i/price } </item> } </descriptive-catalog>
The previous query returns information only about parts that have suppliers and suppliers that have parts. An outer join is a join that preserves information from one or more of the participating sources, including elements that have no matching element in the other source. For example, a left outer join between suppliers and parts might return information about suppliers that have no matching parts.
The following query demonstrates a left outer join. It returns names of all the suppliers in alphabetic order, including those that supply no parts. In the result, each supplier element contains the descriptions of all the parts it supplies, in alphabetic order.
for $s in fn:doc("suppliers.xml")//supplier order by $s/suppname return <supplier> { $s/suppname, for $i in fn:doc("catalog.xml")//item [suppno = $s/suppno], $p in fn:doc("parts.xml")//part [partno = $i/pno] order by $p/description return $p/description } </supplier>
The previous query preserves information about suppliers that supply no
parts. Another type of join, called a full outer join, might be used
to preserve information about both suppliers that supply no parts and parts
that have no supplier. The result of a full outer join can be structured in
any of several ways. The following query generates a list of
supplier
elements, each containing nested part
elements for the parts that it supplies (if any), followed by a list of
part
elements for the parts that have no supplier. This might be
thought of as a "supplier-centered" full outer join. Other forms of outer
join queries are also possible.
<master-list> { for $s in fn:doc("suppliers.xml")//supplier order by $s/suppname return <supplier> { $s/suppname, for $i in fn:doc("catalog.xml")//item [suppno = $s/suppno], $p in fn:doc("parts.xml")//part [partno = $i/partno] order by $p/description return <part> { $p/description, $i/price } </part> } </supplier> , (: parts that have no supplier :) <orphan-parts> { for $p in fn:doc("parts.xml")//part where fn:empty(fn:doc("catalog.xml")//item [partno = $p/partno] ) order by $p/description return $p/description } </orphan-parts> } </master-list>
The previous query uses an element constructor to enclose its output
inside a master-list
element. The concatenation operator (",")
is used to combine the two main parts of the query. The result is an ordered
sequence of supplier
elements followed by an
orphan-parts
element that contains descriptions of all the parts
that have no supplier.
Many queries involve forming data into groups and applying some
aggregation function such as fn:count
or fn:avg
to
each group. The following example shows how such a query might be expressed
in XQuery, using the catalog document defined in the previous section.
This query finds the part number and average price for parts that have at least 3 suppliers.
for $pn in fn:distinct-values(fn:doc("catalog.xml")//partno) let $i := fn:doc("catalog.xml")//item[partno = $pn] where fn:count($i) >= 3 order by $pn return <well-supplied-item> <partno> {$p} </partno> <avgprice> {fn:avg($i/price)} </avgprice> </well-supplied-item>
The fn:distinct-values
function in this query eliminates
duplicate part numbers from the set of all part numbers in the catalog
document. The result of fn:distinct-values
is a sequence in
which order is not significant.
Note that $pn
, bound by a for clause, represents an
individual part number, whereas $i
, bound by a let clause,
represents a set of items which serves as argument to the aggregate functions
fn:count($i)
and fn:avg($i/price)
. The query uses
an element constructor to enclose each part number and average price in a
containing element called well-supplied-item
.
The method illustrated above generalizes easily to grouping by more than
one data value. For example, consider a census document containing a sequence
of person
elements, each with subelements named
state
, job
, and income
. A census
analyst might need to prepare a report listing the average
income
for each combination of state
and
job
. This report might be produced using the following
query:
for $s in fn:distinct-values(fn:doc("census.xml")//state), $j in fn:distinct-values(fn:doc("census.xml")//job) let $p := fn:doc("census.xml")//person[state = $s and job = $j] order by $s, $j return if (fn:exists($p)) then <group> <state> {$s} </state> <job> {$j} </job> <avgincome> {fn:avg($p/income)} </avgincome> </group> else ()
The if-then-else
expression in the above example prevents
generation of groups that contain no data. For example, the census data may
contain some persons who live in Nebraska, and some persons whose job is Deep
Sea Fisherman, but no persons who live in Nebraska and have the job of Deep
Sea Fisherman. If output groups are desired for all possible combinations of
states and jobs, the if-then-else
expression can be omitted from
the query. In this case, the output may include "empty" groups such as the
following:
<group> <state>Nebraska</state> <job>Deep Sea Fisherman</state> <avgincome/> </group>
XQuery uses the <<
and >>
operators
to compare nodes based on document order. Although these operators are quite
simple, they can be used to express complex queries for XML documents in
which sequence is meaningful. The first two queries in this section involve a
surgical report that contains procedure
, incision
,
instrument
, action
, and anesthesia
elements.
The following query returns all the action
elements that
occur between the first and second incision
elements inside the
first procedure. The original document order among these nodes is preserved
in the result of the query.
let $proc := //procedure[1] for $i in $proc//action where $i >> ($proc//incision)[1] and $i << ($proc//incision)[2] return $i
It is worth noting here that document order is defined in such a way that
a node is considered to precede its descendants in document order. In the
surgical report, an action
is never part of an
incision
, but an instrument
is. Since the
>>
operator is based on document order, the predicate
$i >> ($proc//incision)[1]
is true for any
instrument
element that is a descendant of the first
incision
element in the first procedure.
For some queries, it may be helpful to define a function that can test
whether a node precedes another node without being its ancestor. The
following function returns true
if its first operand precedes
its second operand but is not an ancestor of its second operand; otherwise it
returns false
:
declare function local:precedes($a as node(), $b as node()) as boolean { $a << $b and fn:empty($a//node() intersect $b) };
Similarly, a local:follows
function could be written:
declare function local:follows($a as node(), $b as node()) as boolean { $a >> $b and fn:empty($b//node() intersect $a) };
Using the local:precedes
function, we can write a query that
finds instrument
elements between the first two incisions,
excluding from the query result any instrument
that is a
descendant of the first incision
:
let $proc := //procedure[1] for $i in $proc//instrument where local:precedes(($proc//incision)[1], $i) and local:precedes($i, ($proc//incision)[2]) return $i
The following query reports incisions for which no prior anesthesia was
recorded in the surgical report. Since an anesthesia
is never
part of an incision
, we can use <<
instead of
the less-efficient local:precedes
function:
for $proc in //procedure where some $i in $proc//incision satisfies fn:empty($proc//anesthesia[. << $i]) return $proc
In some documents, particular sequences of elements may indicate a logical
hierarchy. This is most commonly true of HTML. The following query returns
the introduction of an XHTML document, wrapping it in a div
element. In this example, we assume that an h2
element
containing the text "Introduction" marks the beginning of the introduction,
and the introduction continues until the next h2
or
h1
element, or the end of the document, whichever comes
first.
let $intro := //h2[text()="Introduction"], $next-h := //(h1|h2)[. >> $intro][1] return <div> { $intro, if (fn:empty($next-h)) then //node()[. >> $intro] else //node()[. >> $intro and . << $next-h] } </div>
Note that the above query makes explicit the hierarchy that was implicit
in the original document. In this example, we assume that the h2
element containing the text "Introduction" has no subelements.
Occasionally it is necessary to scan over a hierarchy of elements, applying some transformation at each level of the hierarchy. In XQuery this can be accomplished by defining a recursive function. In this section we will present two examples of such recursive functions.
Suppose that we need to compute a table of contents for a given document
by scanning over the document, retaining only elements named
section
or title
, and preserving the hierarchical
relationships among these elements. For each section
, we retain
subelements named section
or title
; but for each
title
, we retain the full content of the element. This might be
accomplished by the following recursive function:
declare function local:sections-and-titles($n as node()) as node()? { if (fn:local-name($n) = "section") then element { fn:local-name($n) } { for $c in $n/* return local:sections-and-titles($c) } else if (fn:local-name($n) = "title") then $n else ( ) };
The "skeleton" of a given document, containing only its sections and
titles, can then be obtained by invoking the
local:sections-and-titles
function on the root node of the
document, as follows:
local:sections-and-titles(fn:doc("cookbook.xml"))
As another example of a recursive transformation, suppose that we wish to
scan over a document, transforming every attribute named color
to an element named color
, and every element named
size
to an attribute named size
. This can be
accomplished by the following recursive function:
declare function local:swizzle($n as node()) as node() { typeswitch($n) case $a as attribute(color, *) return element color { fn:string($a) } case $es as element(size, *) return attribute size { fn:string($es) } case $e as element() return element { fn:local-name($e) } { for $c in $e/(* | @*) return local:swizzle($c) } case $d as document-node() return document { for $c in $d/* return local:swizzle($c) } default return $n };
The transformation can be applied to a whole document by invoking the
local:swizzle
function on the root node of the document, as
follows:
local:swizzle(fn:doc("plans.xml"))
It is sometimes necessary to search through a set of data to find all the
distinct combinations of a given list of properties. For example, an input
data set might consist of a large set of order
elements, each of
which has the same basic structure, as illustrated by the following
example:
<order> <date>2003-10-15</date> <product>Dress Shirt</product> <size>M</size> <color>Blue</color> <supplier>Fashion Trends</supplier> <quantity>50</quantity> </order>
From this data set, a user might wish to find all the distinct
combinations of product
, size
, and
color
that occur together in an order
. The
following query returns this list, enclosing each distinct combination in a
new element named option
:
for $p in fn:distinct-values(//product), $s in fn:distinct-values(//size), $c in fn:distinct-values(//color) order by $p, $s, $c return if (fn:exists(//order[product eq $p and size eq $s and color eq $c])) then <option> <product>{$p}</product> <size>{$s}</size> <color>{$c}</color> </option> else ()