Benchmarking an XML Mediator

Benchmarking an XML Mediator

-

English
13 Pages
Read
Download
Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description


BENCHMARKING AN XML MEDIATOR

Florin DRAGAN, Georges GARDARIN
PRiSM Laboratory University of Versailles
78035 Versailles Cedex, France
email: Florin.Dragan@prism.uvsq.fr, georges.gardarin@prism.uvsq.fr
Abstract: In the recent years, XML has become the universal interchange format. Many investigations have
been made on storing, querying and integrating XML with existing applications. Many XML-
based commercial DBMSs have appeared lately. This paper reports on the analysis of an XML
mediator federating several existing XML DBMSs. We measure their storage and querying
capabilities directly through their Java API and indirectly through the XLive mediation tool. For
this purpose we have created a simple benchmark consisting in a set of queries and a variable test
database. The main scope is to reveal the weaknesses and the strengths of the implemented
indexing and federating techniques. We analyze two commercial native XML DBMS and an open-
source relational to XML mapping middleware. We first pass directly the queries to the DBMSs
and second we go through the XLive XML mediator. Results suggest that text XML is not the best
format to exchange data between a mediator and a wrapper, and also shows some possible
improvements of XQuery support in mediation architectures.

executing a query, but a few other ones (like
1. INTRODUCTION the size on disk to store a certain document)
are also proposed.
As XML capabilities have become more and The purpose of ...

Subjects

Informations

Published by
Reads 147
Language English
Report a problem
BENCHMARKING AN XML MEDIATOR
Florin DRAGAN, Georges GARDARIN
PRiSM Laboratory University of Versailles
78035 Versailles Cedex, France
email:
Florin.Dragan@prism.uvsq.fr
,
georges.gardarin@prism.uvsq.fr
Abstract:
In the recent years, XML has become the universal interchange format. Many investigations have
been made on storing, querying and integrating XML with existing applications. Many XML-
based commercial DBMSs have appeared lately. This paper reports on the analysis of an XML
mediator federating several existing XML DBMSs. We measure their storage and querying
capabilities directly through their Java API and indirectly through the XLive mediation tool. For
this purpose we have created a simple benchmark consisting in a set of queries and a variable test
database. The main scope is to reveal the weaknesses and the strengths of the implemented
indexing and federating techniques. We analyze two commercial native XML DBMS and an open-
source relational to XML mapping middleware. We first pass directly the queries to the DBMSs
and second we go through the XLive XML mediator. Results suggest that text XML is not the best
format to exchange data between a mediator and a wrapper, and also shows some possible
improvements of XQuery support in mediation architectures.
1. INTRODUCTION
As XML capabilities have become more and
more popular, a lot of XML-based products
and interfaces have been proposed. Several
XML DBMSs that have been developed try,
on the one hand, to offer the well known
capabilities of a standard DBMS, and on the
other hand, to implement new functionalities
and reach new levels of performance. In the
same time more and more classical DBMSs
add new extensions to store and retrieve XML
documents.
For
measuring
and
comparing
their
performances, a lot of XML benchmarks have
been proposed that "stress" different parts of
the systems, most often the storage engine and
the query processor, by means of a generally
complex set of queries. Each benchmark is
composed of a test database and a set of
queries trying to be as general and complete as
possible. There are also a few benchmarks
specific to a certain domain that propose a
specific format of database and a set of queries
specific to the simulated applications. The
most used metric is the response time for
executing a query, but a few other ones (like
the size on disk to store a certain document)
are also proposed.
The purpose of this paper is to present a
simple general mini-benchmark composed of a
few queries and a variable data set to evaluate
some techniques implemented in the core of
the DBMSs under the pressure of an XQuery
mediator. We are mostly interested in the
implemented
indexing
and
mediation
techniques and how they are influenced by the
size of the data set. Using our mini-
benchmark,
we
test
two
native
XML
commercial DBMSs and one open source
XML to relational mapping middleware and
analyze their response times. Next, we apply
our benchmark to an XML mediator for
finding the delays that are introduced by the
mediation operations. The conclusions show
that XML mediation is a time consuming
operation that has to be optimized both in
communication and processing time.
The rest of this paper is organized as
follows. In the next section we give an
overview of XML mediation technology
focusing on the XLive full-XML mediator.
Section 3 presents our mini benchmark (query
set and data set). We then introduce the
analyzed
local
systems
and
their
characteristics followed by, in section 5, the
results obtained by applying our benchmark
and their meanings. In section 6, we present
the results of the benchmarking operations
using the mediator. In the last section, we
summarize our results and suggest some
improvements to the mediator architecture.
2. XML MEDIATION
Mediation technology based on XML and
XQuery is under development. Some products
are already available. In this section, we
survey this new technology and describe our
XLive mediator (see
www.xquark.org
for an
industrial open source version).
2.1 Basics and Backgrounds
With the advent of XQuery as a standard for
querying XML collections [XQuery, 2003],
several mediator systems have been developed
using XQuery and XML schema as pivot
language and model. Examples of full XML
mediators are the Enosys XML Integration
Platform (EXIP [Papakonstantinou, 2003],),
the Software A.G. EntireX XML Mediator, the
Liquid Data mediator of BEA derived from
EXIP, the e-XMLMedia XML Mediator, a
predecessor of our current XLive project
[Gardarin, 2002].
XML Mediators are focused on supporting
the XQuery query language on XML views of
heterogeneous data sources. The data are
integrated
dynamically
from
multiple
information sources. Queries are used as view
definitions. During run-time, the application
issues XML queries against the views. Queries
and views are translated into some XML
algebra and are combined into single algebra
query plans. Sub-queries are sent to local
wrappers that process them locally and return
XML
results.
Finally,
the
global
query
processor
evaluates
the
result,
using
appropriate
integration
and
reconstruction
algorithms.
XQuery is a powerful language, which
encompasses SQL and much more. Notably, it
is able to query rich and extensible data types;
it is a functional language, so that any valid
expression applied to a valid expression is a
valid query; it will soon incorporate XQuery
Text for full text queries. XQuery Text shall
provide functionalities as single-word search,
phrase search, support for stop words, search
on prefix, postfix, infix, proximity searching,
word normalization, diacritics, ranking and
relevance. All these features will make
XQuery an ideal language for querying
integrated data sources.
2.2 Overview of XLive Mediator
In the XLive project, we use a mediation
architecture to support enterprise information
integration shown in Figure 1. It follows the
classical wrapper-mediator architecture as
defined
in
[Wiederhold,
1992].
The
communication
between
wrappers
and
mediator follows a common interface, which
is defined by an applicative Java or Web
service interface named XML/DBC. With
XML/DBC, requests are defined in XQuery
and results are returned in text XML format.
Web
Interface
Java
Application
Java Application
RDB1
Oracle
RDB2
MySQL
XML DB3
Xyleme
Wrapper
Wrapper
Wrapper
Mediator
Mediator
Mediator
Figure 1 - XLive Architecture
Our architecture is composed of mediators
that deal with distributed XML sources and
wrappers that cope with the heterogeneity of
the sources (DBMS, Web pages, etc.). The
XLive
mediator
is
a
data
integration
middleware
managing
XML
views
of
heterogeneous data sources. Using XLive
mediator one can integrate heterogeneous data
sources without replicating their data while the
sources remain autonomous.
XLive mediator is entirely based on W3C
standard technology: XML, XQuery, XML-
Schema,
SAX,
DOM
and
SOAP.
All
information exchanges rely on XML format.
XML-Schema
is
used
for
metadata
representation. Wrappers provide schemas to
export information about local data structures.
XQuery is employed for querying both the
mediator and the wrappers. Connectivity of
mediator
and
wrappers
relies
on
the
XML/DBC
programming
interface,
an
extension of JDBC to integrate XQuery. More
information about the XLive mediator can be
found in [Dang-Ngoc, 2003].
To integrate a new source into the
mediation architecture, a wrapper must be
built. It has to implement the XML/DBC
programming
interface.
DBMS
are
data
oriented sources and metadata are provided to
describe
sources
and
mappings.
DBMS
wrappers translate data sources in XML and
process a possibly reduced set of XQuery on
the source data. In the case of Web source, the
wrapper brings more intelligence. It aims at
semantically integrating Web information in a
common model accessible to programs.
3. PROPOSED BENCHMARK
Several benchmarks have been developed for
XML
DBMSs,
among
them
XMach-1
[XMach-1, 2001], XMark [XMark, 2001] ,
X007 [X007, 2002], XBench [XBench, 2004].
They all have their interests, but are in general
too complex for current mediators, both in
functionality and size. In this section, we
introduce our simpler benchmark.
3.1 Presentation
We propose a simple generic benchmark for
testing the basic functionalities of an XML
mediator and evaluating the performances of
the different join algorithms and indexing
schemas of the local sources. The existing
benchmarks
generally
propose
a
set
of
complex queries that evaluate many of the
properties of the query processor in the same
query. By appealing at simple operations, our
goal is to stress only certain functions: local
indexing, XML transfer and parsing, join
algorithms, etc. Another reason for proposing
only simple queries is that we used our
benchmark to test the XLive mediator that
performs basic XQuery to integrate multiple
sources. Generally, it takes a long time for a
mediator to perform complex join operations
(time that depends on the mediator join
algorithms and on other external parameters as
the
network
delay,
the
distant
DBMS
capabilities, and on the source speed to
transfer the results). Yet another reason to use
a simple XQuery benchmark is that most
tested DBMSs only support the core of
XQuery with realistic performance on the
computer we are using.
3.2 Data Set
The data set is composed of 2 document
models: one data oriented and the other text
oriented. With a small depth (of maximum 3)
and a small width (of maximum 5), the two
documents have a simple structure that
facilitates the evaluation of different structural
selection queries. The two documents are
logically connected, which gives us the
possibility to perform simple join operations
between documents that are located on
different systems. A graphical representation
of the schema of the two documents is given
in Figure 2 and 3. The schema is variable in
the sense that neither the number of "authors"
of a book nor the number of paragraphs in the
reviews are constant. The textual content is
generated from the most popular English
words extracted from Shakespeare’s plays.
Figure 2: Catalog schema
Figure 3: Review schema
In order to evaluate the performances of
the XML systems, we generated 3 data sets:
300/750/1500 documents, each documents
having a size less than 2k. We used the utility
toXgene [toXgene] and we started from a
provided example for generating our data set.
3.3 Queries
Our benchmark proposes a representative set
of XML DBMS query functionalities, which
can be grouped as follows:
(i) Simple XPath expressions.
Queries Q1 and
Q2 represents XQueries that require selections
on the elements and attributes names:
Q1:
for
$b
in
collection("catalog")
/catalog/book return $b
Q2:
for $currency in collection("catalog")
/catalog/book/price/@currency
return
$currency
(ii) XPAth with predicates.
Q3, Q4, Q5
introduce
predicates
to
perform
simple
selections.
Q3
predicate
tests
for
exact
equality:
Q3:
for
$b
in
collection("catalog")
/catalog/book where $b/price/@currency =
"CDN" return $b
Q4 contains a “range” predicate:
Q4:
for
$b
in
collection("catalog")
/catalog/book where $b/price < 100 return $b
Q5 contains the two previous predicates:
Q5:
for
$b
in
collection("catalog")
/catalog/book where $b/price < 100 and
$b/price/@currency = "CDN" return $b
(iii) Recursive Path optimization.
Q6 contains
a recursive wildcard "//" expression that tests
for the optimality of the path evaluation
(sometimes called the indexation of "//"):
Q6:
for
$col
in
collection("catalog")
return $col//price
(iv)
Result
ordering.
For
testing
the
performances of generating an ordered result,
we have introduced an order-by XQuery:
Q7:
for $col_rev in collection("review"),
$rev
in
$col_rev/review,
$rate
in
$rev/review/@rating order by ($rate) return
$rev
(v) Text search.
Q8 contains the "contains"
predicate
to
stress
some
text
indexing
capabilities:
Q8:
for
$b
in
collection("catalog")
/catalog/book
where
contains($b/author,
"Fumio") return $b
(vi) Joins on values.
Q9 and Q10 require joins
between the two documents:
Q9 join and text searching: for $col_cat in
collection("catalog"),
$col_rev
in
collection("review"),
$b
in
$col_cat/catalog/book,
$rev
in
$col_rev/review, $rev_rev in $rev/review
where
$b/@isbn=$rev/book/@isbn
and
contains($rev_rev,"dolphins")
return $b/@genres.
Q10
equality
join:
for
$col_cat
in
collection("catalog"),
$rev_cat
in
collection("review"),
$b
in
$col_cat/catalog/book,
$r
in
$rev_cat/review
where $b/@isbn=$r/book/@isbn
return $r/review/@rating
(iiv)
Result
generation.
Q11
tests
the
performances of the "query processor" to
generate new results:
Q11:
for $col_rev in collection("review"),
$rev in $col_rev/review
where $rev/review/@rating <2
return
<lowRateBook>
<title>{$rev/book/title/text()}</title>
{for $col_cat in collection("catalog"),
$b in $col_cat/book
where $b/@isbn=$rev/book/@isbn
return <price>$b/price/text()</price>}
</lowRateBook>
3.4 Metrics
For evaluating a query processor, we measure
the query execution time and the size (in
bytes) of the result.
3.5 Benchmark Host
For running the benchmark and evaluating the
different DBMSs we have used a PC with the
following configuration:
-
vendor\s\do6(i)d : GenuineIntel
-
cpu family : 6
-
model : 9
-
model name : Intel® Pentium® M
processor 1600MHz
-
stepping : 5
-
cpu MHz : 1598.674
-
cache size L1 : 0 KB
-
fpu : yes
-
cpuid level : 2
-
wp : yes
-
flags : fpu vme de pse tsc msr mce cx8
sep mtrr pge mca cmov pat clflush dts
acpi mmx fxsr sse sse2 tm
-
bogomips : 3191.60.
-
OS: RedHat Linux 9,
kernel version:
2.4.20-8
3.6 Benchmarking Method
All the systems were evaluated using the
provided Java API. The queries were run ten
times and the average execution time was
presented. The execution of every query
followed the warming “step”, which consists
in executing the same query several times
before the evaluation. The set of results was
not scaled by eliminating the min or max .
4. DATA SOURCE DBMSs
In this section, we present the local DBMSs
handling the data sources and integrated in the
mediation platform. For commercial reasons,
we call the native XML DBMSs XDBMS1
and XDBMS2.
4.1 XDBMS1
XDBMS1 is a native European XML DBMS
that handles the storage, retrieval, indexing,
integration and distribution of semi-structured
data. The basic components are the repository,
tailored to tree data, and the index manager
that provides two kind of indexes: a standard
B-tree for indexing dates and integers and a
full text index indexing keywords but also
path labels. XDBMS1 main features are the
scalability and the power of rapid query
processing based on the index kept in memory.
It supports as query language a limited
XQuery but does not support XPath exactly.
For this reason, we had to translate all our
queries in XDBMS1 XQuery.
When a document is loaded in the
XDBMS1
repository,
it
is
automatically
indexed. XDBMS1 uses an XML index that
takes into consideration the data from an XML
file and also the metadata, storing information
about the meaning and the context of the
words. All the words resulting from stemming
are indexed. This means that basically the
separators are not indexed, nor the very
frequent words (like the prepositions or
articles). Different forms of the same word are
indexed by the same root word and only the
position of the words is added to the index.
This means that for the "plural" of a word only
the position in the document is added in the
index, but a new entry is not created (the entry
being
represented
by
the
"singular").
Generally the index managed by XDBMS1 is
reduced, the maximum expansion factor to the
data being of 80.
4.2 XDBMS2
For introducing a different native DBMS for
our
benchmark,
we
selected
XDBMS2,
another European XML DBMS. XDBMS2 is a
native XML DBMS that provides advanced
XML
data
processing
and
storage
functionality. Besides storage and querying, it
provides capabilities of versioning, indexing,
link
management,
publishing,
schema
verification,
etc.
XDBMS2
offers
the
possibility to create multiple types of indexes
among them: value, attribute ID, element
name, full text, ...
XDBMS2 provides
a user
controlled
indexing system that for each library or
document loaded in the repository creates a
new set of indexes. Depending on the updating
modality, the indexes are divided in "live" or
"non-live". The "live" indexes are updated
automatically when new data is loaded in
opposite to the "non-live" indexes that are
updated only on request. Several types of
indexes are implemented :
-
library indexes: A library is a logical
structure that can contain a set of
documents
and
other
libraries.
The
presence of libraries can reduce the
"range" of queries and, consequently,
speed up their execution. Library indexes
are live indexes.
-
id attribute indexes: They store elements
by their ID attributes specified in the DTD
or XML-Schema. They are live indexes
created at document or library level.
-
element name indexes: The elements in a
library or document are indexed by their
name. XDBMS2 provides the facility of
indexing all the elements or only the
elements of a certain selection.
-
value indexes: These are live indexes,
created at library or document level, that
stores elements by their value or by the
value of their attributes. It is possible to
specify the type of the values that will be
put
in
the
index
being
the
user
responsibility
to
convert
the
element/attribute values to declared index
types.
-
full text indexes: Stores elements by their
textual values or by the values of their
attributes. Apart from the value indexes,
using this type of index make possible to
select elements that have a certain word in
their textual value.
-
content conditioned indexes: Only certain
nodes are indexed according to a user
defined key. The user has to write a filter
for selecting the nodes that will be
indexed and to assign a key with each
equivalent class.
4.3 XQuark Bridge
XQuark
Bridge
is
an
XQuery
wrapper
proposed in open source by the XQuark
company to query relational databases in
XQuery. With XQuark Bridge, each table is
seen as a flat XML collection of documents.
Queries can be formulated in XQuery to
generate nested XML or to define XML views
of relational tables. The views can in turn be
queried in XQuery. XQuark Bridge works in a
similar way with Oracle, SQL Server or
MySQL. Another goal of our benchmark is to
evaluate XQuark Bridge in comparison with
native XML DBMSs.
5. DBMS EVALUATIONS
In this section we first present the results of
analyzing XDBMS1 and XDBMS2 with the
proposed set of queries. Next, we present the
results of analyzing XQuark Bridge. We
created
two
clusters
for
XDBMS1
and
similarly two libraries for XDBMS2, each of
them containing one category of documents. A
full text index and an element name index
have been created with XDBMS2. XDBMS1
database is indexed with the default indexing
configuration.
5.1 Results of experiments with
XDBMS1 and XDBMS2
We are interested in both the processes of
query
evaluation
and
result
generation.
XDBMS2 uses a lazy evaluation technique;
thus for evaluating all the results our test
method iterates fully thought the iterator.
Table1 contains the results for a data set
with the 2 document models (one structured
and one unstructured) organized in 201 files
(one for the structured document and 200 for
the others). The size of the single structured
document is 40k and each document of the
others is less than 2K. The size of the data set
is in this case around 300K.
Query
Time
Results
XDBMS1
XDBMS2
elements
Q1
32,6ms
60,2ms
100
Q2
14,4ms
21,1ms
100
Q3
34,6ms
18,4ms
100
Q4
33,9ms
24,6ms
69
Q5
32,2ms
17,0ms
69
Q6
13,6ms
7,2ms
100
Q7
190,6ms
77,7ms
200
Q8
12,2ms
1,9ms
4
Q9
25,6ms
269,6ms
79
Q10
37,7ms
249,4ms
200
Q11
72,3ms
35,5ms
60
Table 1: Results for DataSet 1
Table2 presents the results for the data set
2 of size 704K.
Query
Time
Results
XDBMS1
XDBMS
2
elements
Q1
77,5ms
88,5ms
250
Q2
20,0ms
23,6ms
250
Q3
82,5ms
41,6ms
250
Q4
54,2ms
39,5ms
168
Q5
71,0ms
29,9ms
168
Q6
21,0ms
16,2ms
250
Q7
382,0ms
153,5ms
500
Q8
16,2ms
4,8ms
10
Q9
45,8ms
1509ms
212
Q10
88,9ms
1404ms
500
Q11
272,8ms
76,4ms
148
Table 2: Results for DataSet 2
Table 3 presents the results for the data set
3 of size 1.3M.
Query
Time
Results
XDBMS
1
XDBMS2
elements
Q1
152,3ms
113,6ms
500
Q2
29,9ms
31,4ms
500
Q3
153,2ms
69,9ms
500
Q4
105,3ms
54,7ms
339
Q5
131,3ms
54,2ms
339
Q6
34,3ms
25,3ms
500
Q7
686,6ms
301,2ms
1000
Q8
19,1ms
7,1ms
15
Q9
93,3ms
5859,6ms
427
Q10
207,9ms
5717,4ms
1000
Q11
766,8ms
165,1ms
285
Table 3: Results for DataSet 3
We also measured the time required by
XDBMS1 to generate the first result. The
results are presented in table 4 :
Query
DS1
DS2
DS3
Q1
10,5
11
10,5
Q2
9,2
9,2
9,5
Q3
9,7
10,1
10,0
Q4
9,2
9,4
9,4
Q5
11,2
10,4
10,4
Q6
8,4
8,5
8,6
Q7
62,1
152,1
320,2
Q8
9,3
9,2
9,8
Q9
11,4
11,8
11,2
Q10
11,5
10,5
11
Q11
14,3
15,5
15,5
Table 4: XDBMS1 time to generate the first
result
5.2 Some Discussions
Generally XDBMS1 times are better when
evaluating
queries
that
imply
simple
selections. For queries Q1 and Q2 with
element selection XDBMS1 gives good results
(generally this would be probable when using
a structure index but XDBMS1 indexes only
stemmed words) but for the third query
XDBMS2 results are better. The third query
requires the evaluation of a simple equal
predicate that compares the value of an
attribute (an exact match).
The same thing happens on Q5 with the
presence of the same predicate. There may be
the cause of XDBMS2 value index that
performs better than the stemmed indexation
proposed by XDBMS1. This can be explained
also by the fact that the results for Q1 and Q3
with the same number of results and the same
returned structure are very close for XDBMS1
and very different for XDBMS2, the last being
influenced by the value index.
According to XDBMS1 developers, it
would be better for XDBMS1 if a “contains”
statement
would
replace
the
equality
predicate;
query
processing
would
then
beneficiate
from
XDBMS1
stemming
technique. But, the benchmark will then be
different.
Q7 involves the creation of a sorted set of
results ("clause order-by") on the attributes
value. Again XDBMS2 value index tends to
be more efficient. For Q8 (text searching)
XDBMS2 performs better. This fact can be
explained by the text index of low dimension
that is directly used by XDBMS2 (using ftd
function in a reformulation of Q8 for taking
advantage of XDBMS2 text index).
Q9 and Q10 require the computation of
join operations; the execution time is greatly
influenced by the join algorithms. XDBMS1
join algorithms seem to be more optimized
and to generate faster the results. When
performing the join operation with active
indexes, XDBMS2 times are bigger than
without the indexes; thus, we present
the
response times without the presence of
indexes.
Q11 stresses the repository and involves
new result generation. The result creation
technique utilized by XDBMS2 seems to be
more efficient.
Generally XDBMS1 does not perform
very
well
on
simple
selections
when
increasing the size of the database. This is
somehow contrary to stemming indexation
that should be very efficient on large data sets.
For the join operations, even for bigger data
sets, XDBMS1 works very fine with fast
results.
5.3 Results of experiments with
XQuark Mapping XML to Tables
In this sub-section, we present the results of
analyzing XQuark Bridge with the proposed
set of queries. We first run XQuark on top of
Oracle, and then on top of MySQL. For each
dataset, we define the natural mapping to
relational tables with foreign keys for joins.
The tables were indexed on keys and foreign
keys. We run the benchmark on our mono-
processor system. Results are given in table
5,6 and 7. They are quite good for Oracle but
no so good for MySQL (Q5 is 0 for MySQL
because of a wrapper fault). This is due to the
fact that nested SQL queries resulting from the
mapping are not processed efficiently by
MySQL.
Query
Time
Results
Oracle
MySQL
elements
Q1
132,0
127,2
100,0
Q2
23,8
33,9
100,0
Q3
105,1
122,1
100,0
Q4
91,0
113,0
69,0
Q5
91,1
0,0
69,0
Q6
32,6
27,5
100,0
Q7
281,8
469,2
200,0
Q8
35,4
56,2
4,0
Q9
51,2
249,7
79,0
Q10
26,1
238,2
200,0
Q11
86,7
2312,4
60,0
Table 5: XQuark results for DS1
Query
Time
Results
Oracle
MySQL
elements
Q1
162,3
217,6
250,0
Q2
33,0
11,5
250,0
Q3
173,0
236,9
250,0
Q4
134,8
185,8
168,0
Q5
154,1
0,0
168,0
Q6
37,5
15,8
250,0
Q7
590,5
674,7
500,0
Q8
72,1
110,8
10,0
Q9
40,3
9998,8
212,0
Q10
61,6
1546,6
500,0
Q11
133,6
55531,4
148,0
Table 6: XQuark results for DS2
Query
Time
Results
Oracle
MySQL
elements
Q1
302,9
355,9
500,0
Q2
46,2
15,5
500,0
Q3
335,7
433,0
500,0
Q4
231,0
327,7
339,0
Q5
235,0
0,0
339,0
Q6
39,2
41,9
500,0
Q7
1119,5
1290,7
1000,0
Q8
80,3
179,6
15,0
Q9
80,2
42493,8
427,0
Q10
61,1
26934,0
1000,0
Q11
214,3
277804,6
285,0
Table 7: XQuark results for DS3
6. MEDIATOR EVALUATION
We run the benchmark queries on top of the
XLive mediator using XDBMS1 and the
XDBMS2 as data source. With multiple data
sources, times are sensibly the same as with
one. Thus, we only report the results for the
mediator on top of a unique data source.
6.1 Results of experiments
Tables 8,9 and 10 present the results of
evaluating the query using a mediator on top
of XDBMS1 and XDBMS2 for all the data
sets. Most time in the mediator is taken to
iterate
on
the
intermediate
results
and
construct the final result. As XLive exchanges
data with sources in text XML (as with Web
services), a reparsing of all the partial results
is required, which is costly in Java on a small
portable computer.
Better results could be
obtained if the mediator would use a cache for
temporary storing source query results in an
easy to serialize format.
Query
Time
Results
XDBMS1
XDBMS2
elements
Q1
444,5
524,8
100
Q2
245,4
213,1
100
Q3
504,2
417,8
100
Q4
333,4
264,9
69
Q5
422,2
306,8
69
Q6
206,9
269,1
100
Q7
992,1
2151,8
200
Q8
137,5
4,61
4
Q9
423,3
1939,6
79
Q10
698,6
3293,5
200
Q11
945,5
1527,5
60
Table 8: Mediator results for DS1
Query
Time
Results
XDBMS1
XDBMS2
elements
Q1
758,9
2378,5
250
Q2
335,7
313,6
250
Q3
879,1
1804,8
250
Q4
652,1
866,0
168
Q5
674,8
893,2
168
Q6
263,9
263,7
250
Q7
1242,7
3412,0
500
Q8
136,8
7,24
10
Q9
508,0
3128,4
212
Q10
997,5
7986,1
500
Q11
2008,4
2854,8
148
Table 9: Mediator results for DS2
Query
Time
Results
XDBMS1
XDBMS2
elements
Q1
1106,6
7490,3
500
Q2
388,3
759,5
500
Q3
1144,5
8174,7
500
Q4
759,2
3998,3
339
Q5
739,1
3661,3
339
Q6
933,8
979,3
500
Q7
1887,5
4816,6
1000
Q8
131,8
8,52
15
Q9
829,6
6160,3
428
Q10
1586,4
14962,6
1000
Q11
4411,7
4051,7
285
Table 10: Mediator results for DS3
6.2 Some Discussions
It is important to mention that the mediator
evaluation time is strongly influenced by the
Java API provided by the mediated DBMSs.
This may mean that sometimes the generated
sub-queries are the best possible.
Another important point is that at the
mediator level, it is not always possible to
benefit from the best indexing techniques of
each local data source. For example when
evaluating Q8 on XDBMS2, in order to take
advantage of the text indexation, it is required
to
use
the
non-XQuery
function
“XDBMS2:fts”.
On
the
other
hand
the
mediator supports standard XQuery with no
specific
functions.
Thus,
an
optimized
translation
from
XQuery
to
XDBMS2
functions would require more parameters and
a constant phasing of the wrapper with the
vendor's different optimal functions. Another
actual problem that penalize the mediator
evaluation is the translation between the
XLive XQuery to real DBMSs, which are in
reality far from the standards. This factor
should
disappear
with
the
finale
standardization of XQuery.
For DS1, total time for running the whole
benchmark with XDBMS1 is 499 ms while it
is 5353 ms with the mediator on top of
XDBMS1. This shows an average factor of 10,
mainly due to data transfer and parsing. Total
time with XDBMS2 is 71 versus 922 with the
mediator on top of XDBMS2. This shows an
average factor of 13. The global difference
may come from the quality of the wrapper
(better optimizations have been made with
XDBMS1). Other ratios with the other data
sets
DS2
and
DS3
are
a
bit
better
(approximately 7 and 5) for XDBMS1. The
more reduced ratios are caused by the fact that
the query processing time, at XDBMS1 level,
grows “faster” that the time required to parse
additional results, at the mediator level.
Figures 4, 5 and 6 gives the detailed ratios
between the response time with mediator
versus direct response time. The ratio for
XDBMS2 increases for bigger data sets. This
means that the time required to analyze more
results
(due
to
iteration,
parsing,
and
serialization)
grows
“faster”
than
the
additional time required by XDBMS2 to
generate more results.
0
5
10
15
2 0
2 5
3 0
3 5
4 0
4 5
X DBM S1
X DBM S2
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11