284 Pages
English

Modeling and implementing multidimensional hierarchically structured data for data warehouses in relational database management systems and the implementation into transbase [Elektronische Ressource] / Roland Pieringer

Gain access to the library to view online
Learn more

Description

Modeling and implementing multidimensional hierarchically structured Data for Data Warehouses in relational Database Management Systems and the Implementation into Transbase® Roland Pieringer Institut f r Informatik der Technischen Universit?t M nchen Modeling and implementing multidimensional hierarchically structured Data for Data Warehouses in relational Database Management Systems and the Implementation into Transbase® Roland Pieringer Vollst ndiger Abdruck der von der Fakult t f r Informatik der Technischen Universit t M nchen zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigten Dissertation. Vorsitzender: Univ-Prof. Dr. F. Matthes Pr fer der Dissertation: 1. Univ.-Prof. R. Bayer, Ph.D. 2. Univ.-Prof. Dr. B. Freitag Universit t Passau Die Dissertation wurde am 29.01.2003 bei der Technischen Universit t M nchen eingereicht und durch die Fakult t f r Informatik am 17.06.2003 angenommen. In memory of my father, Georg Pieringer. Acknowledgements First of all, I want to thank my advisor Prof. Rudolf Bayer, Ph.D., for his support and fruitful discussions. Even for me who did an external Ph.D. thesis he found time to discuss and talk about problems. He spent his vacation reading and correcting my thesis!

Subjects

Informations

Published by
Published 01 January 2003
Reads 48
Language English
Document size 2 MB








Modeling and implementing
multidimensional hierarchically structured
Data for Data Warehouses in relational
Database Management Systems and the
Implementation into Transbase®



Roland Pieringer
Institut f r Informatik
der Technischen Universit?t M nchen



Modeling and implementing
multidimensional hierarchically structured
Data for Data Warehouses in relational
Database Management Systems and the
Implementation into Transbase®



Roland Pieringer

Vollst ndiger Abdruck der von der Fakult t f r Informatik der Technischen Universit t
M nchen zur Erlangung des akademischen Grades eines

Doktors der Naturwissenschaften

genehmigten Dissertation.


Vorsitzender: Univ-Prof. Dr. F. Matthes

Pr fer der Dissertation:
1. Univ.-Prof. R. Bayer, Ph.D.
2. Univ.-Prof. Dr. B. Freitag
Universit t Passau

Die Dissertation wurde am 29.01.2003 bei der Technischen Universit t M nchen

eingereicht und durch die Fakult t f r Informatik am 17.06.2003 angenommen.





















In memory of my father, Georg Pieringer.


Acknowledgements

First of all, I want to thank my advisor Prof. Rudolf Bayer, Ph.D., for his support and fruitful
discussions. Even for me who did an external Ph.D. thesis he found time to discuss and talk about
problems. He spent his vacation reading and correcting my thesis!
Then, of course, I thank Transaction Software that supported me and gave me the time to work for the
thesis. I also was allowed to participate in interesting conferences. I especially want to thank Klaus
Elhardt who was a steady discussion partner and influenced the work a lot due to his experience in the
field of database systems (theoretically and practically). I also thank Chrstian Roth, the head of
Transaction Software, who enriched the work by many commercial ideas. Of course, I am very
grateful to all other colleges, too, that are (in alphabetical order): Ralph Acker, Adolf Alt, Gerhard
D nzinger, Martha Jelinek, Dieter Killar, Susanne Krall, Erwin Loibl, Wolfgang Schwarz and Christof
Seibt for all the moral support and other help.
I esspecially thank the people at FORWISS and TU-M nchen, in particular, Volker Markl, Frank
Ramsak, Robert Fenk and Martin Zirkel. The team work was and is still great. We had days of
discussions that resulted in various publications and finally in this thesis. I also want to thank all other
people at FORWISS that I had contact with.
Further, I thank all project partners of the EDITH project for their input and discussions and also very
nice time spending in different cities all over Europe.
I thank my family, i.e., my mother and my sister. They did not urge me too much to finish the thesis.
Last but not least I thank all my friends, especially all Corpsbr der of Corps Alemannia who always
were willing to have a great free time after working periods, and all girls that distracted me from the
work and prevented me from getting frustrated.
Abstract:
Efficient star query processing is crucial for a performant data warehouse (DW)
implementation and much work is available on physical optimization (e.g., indexing and
schema design) and logical optimization (e.g., pre-aggregated materialized views with
query rewriting). Organizing fact tables with clustering multidimensional access methods
(like the UB-Tree) are a promising approach to speed up star queries. However, the
implementation into commercial products has not been done so far, since in addition to
the clustering index organization, many parts of a database management system must be
extended. For example, the query optimizer with corresponding cost model modifications
must support the new organization and various optimization topics.
In this thesis, we present EHC, the Encoding for Hierarchical Clustering in combination
with UB-Trees. EHC enables the use of clustering index structures also for hierarchical
data. EHC is extended to MHC, the multidimensional hierarchical clustering by
combining multiple dimensions. Based on the concept of MHC, we develop a number of
query optimization algorithms, in order to support hierarchical clustering with query
processing. For this purpose, we present a complete abstract processing plan that captures
all necessary steps in evaluating star queries in these environments. One important step in
the query processing phase is, however, still a bottleneck: the residual join of results from
the fact table with the dimension tables in combination with grouping and aggregation.
This phase typically consumes between 50% and 80% of the overall processing time. In
typical data warehouse scenarios pre-grouping methods only have a limited effect as the
grouping is usually specified on the hierarchy levels of the dimension tables and not on
the fact table itself. Therefore, we suggest a combination of hierarchical clustering and
pre-grouping. Exploiting hierarchy semantics for the pre-grouping of fact table result
tuples is several times faster than conventional query processing. The reason for this is
that hierarchical pre-grouping reduces the number of join operations significantly. With
this method even queries covering a large part of the fact table can be executed within a
time span acceptable for interactive query processing.
All these concepts have been implemented during this thesis into the commercial
database management system Transbasefi Hypercube and already run productive at a
couple of customers of Transaction Software GmbH.
During the implementation further problems occurred, like complex aggregate
expressions, multiple query boxes, non-clustering dimensions, complex schemata, multi-
fact-table-joins etc. For these problems, solutions are described and have been
implemented.
We further address some theoretical aspects of multiple hierarchies and dynamic changes
of surrogates and a complete hierarchy model.
Finally, we present measurement results of a complex real-world sales transaction data
warehouse of an electronic retailer and of the APB standard benchmark for OLAP. These
measurements show the benefit of the implemented methods compared to conventional
state of the art techniques and database management systems.