Efficient Change Management of XML Documents [Elektronische Ressource] / Sebastian Rönnau. Universität der Bundeswehr München, Fakultät für Informatik. Gutachter: Michael Koch. Betreuer: Uwe M. Borghoff
203 Pages
English

Efficient Change Management of XML Documents [Elektronische Ressource] / Sebastian Rönnau. Universität der Bundeswehr München, Fakultät für Informatik. Gutachter: Michael Koch. Betreuer: Uwe M. Borghoff

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Informations

Published by
Published 01 January 2011
Reads 36
Language English
Document size 1 MB

Efficient Change Management of XML Documents
Dissertation
zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften (Dr. rer. nat.)
vorgelegt von
Dipl.-Inf. Sebastian Rönnau
im August 2010
Tag der mündlichen Prüfung: 09. Dezemver 2010
Vorsitzender der Kommission: Prof. Dr. Peter Hertling
1. Berichterstatter: Prof. Dr. Uwe M. Borghoff
2. Prof. Dr. Michael Koch
1. Prüfer: Prof. Dr. Ulrike Lechner
2. Prof. Klaus Buchenrieder, Ph.D.
Universität der Bundeswehr München
Fakultät für InformatikAbstract
XML-based documents play a major role in modern information architectures and their corre-
sponding work-flows. In this context, the ability to identify and represent differences between
two versions of a document is essential. A second important aspect is the merging of document
versions, which becomes crucial in parallel editing processes.
Many different approaches exist that meet these challenges. Most rely on operational trans-
formation or document annotation. In both approaches, the operations leading to changes are
tracked, which requires corresponding editing applications. In the context of software devel-
opment, however, a state-based approach is common. Here, document versions are compared
and merged using external tools, called diff and patch. This allows users for freely editing
documents without being tightened to special tools. Approaches exist that are able to compare
XML documents. A corresponding merge capability is still not available.
In this thesis, I present a comprehensive framework that allows for comparing and merging
of XML documents using a state-based approach. Its design is based on an analysis of XML
documents and their modification patterns. The heart of the framework is a context-oriented
delta model. I present a diff algorithm that appears to be highly efficient in terms of speed
and delta quality. The patch algorithm is able to merge document versions efficiently and
reliably. The efficiency and the reliability of my approach are verified using a competitive test
scenario.
iiiivAcknowledgements
Many people have supported me in my work. First of all, I have to thank to my supervisor,
Uwe M. Borghoff, for giving me the opportunity to do my research. He allowed me to follow
my own research interests, with broad but precise constraints. He has always trusted in me
and my work and encouraged me to take on the responsibility for the organization of the ACM
DocEng 2009. I also have to thank my second advisor, Michael Koch. He provided me with
valuable comments, helping me to sharpen this thesis up.
Almost all of the time, I did really enjoy being member of the Institute of Software Tech-
nology. Apart from the excellent research conditions, the staff played a major role. Florian
Brieler, Nico Krebs, Sonja Maier, Steffen Mazanek, Mark Minas, Peter Rödig, Arne Seifert,
Thomas Triebsees, and Daniel Volk helped me elaborating my ideas in countless discussions.
Susanna Defendi, Harald Hagel, and Michael Minkus allowed me for discussing topics not
related to the domain of computer science. It was a pleasure and a honor to me to work with
my colleagues. Some of them became even friends.
Other people have contributed to my thesis. First of all, Jan Scheffczyk helped me writing
my first paper, opening me the way to an academic career. Many students supported my
research. Christian Pauli wrote the first prototype of XCC Patch. Geraint Philipp, a highly
skillful programmer, made a complete re-design of XCC Patch and programmed XCC Diff.
Maik Teupel designed a user interface and wrote converting tools. Arthur Müller developed
further applications of the XCC framework on the file system level. Manfred Pohlemann
supported my work by developing a test framework.
Finally, I have to thank my family. My wife Silvia and my children Luca and Lilith have
been very patient with me, accepting when I was coming home late or lost in thought. My
youngest girl, Lene, was being born just a few hours after the submission of my final draft.
Thanks for her patience, too.
vviContents
Abstract iii
Contents v
List of Figures xi
List of Tables xiii
List of Acronyms xv
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Document Change Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Goal and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
I Concepts and Notations 7
2 XML Documents 9
2.1 What is a Document? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 XML in General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Document Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Tree Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Modification Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Differencing Strings and Trees 29
3.1 String Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Differencing XML Documents . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 Document Merging 47
4.1 Parallel Editing of Documents . . . . . . . . . . . . . . . . . . . . . . . . . 47
viiContents
4.2 State-Based Change Control . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Operational Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Annotation of Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
II A Context-Sensitive Approach to XML Change Control 63
5 The XCC Framework 65
5.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6 A Context-Oriented Delta Model 73
6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Edit Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.4 Set-Based Delta vs. Edit Script . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.5 Representing Context in XML . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.6 Inversion, Aggregation, and Decomposition . . . . . . . . . . . . . . . . . . 84
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7 An Efficient Differencing Algorithm 87
7.1 Basic Idea and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 Finding the Common Content . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3 Identifying Structure-Preserving Changes . . . . . . . . . . . . . . . . . . . 90
7.4 Catching Structure-Affecting Changes . . . . . . . . . . . . . . . . . . . . . 92
7.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8 A Merge-Capable Patch Algorithm 101
8.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2 Linear Patching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3 Context-Aware Patching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.4 Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.5 Best-Effort Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.6 The Algorithm at a Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.7 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9 Evaluation 115
9.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
viiiContents
9.2 Test Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.3 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.4 Merge Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
III Conclusions 127
10 A Comparative Evaluation of XML Differencing Tools 129
10.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
10.2 Test Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
10.3 Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.4 Delta Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
11 Conclusions and Future Work 143
11.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
11.2 Scientific Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
11.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
11.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Bibliography 151
A Resources for Document Evaluation 165
A.1 Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
A.2 Web Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
A.3 Office . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
B XML Document Hashing 169
B.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
B.2 XML Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
B.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
C XCC Delta Specification 173
D A Running Example 175
D.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
D.2 Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
D.3 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
D.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
ixContents
x