160 Pages
English

Towards effective biomedical knowledge discovery through subject-centric semantic integration of the life-science information space [Elektronische Ressource] / Karamfilka Krasimirova Nenova

-

Gain access to the library to view online
Learn more

Description

TECHNISCHE UNIVERSITÄT MÜNCHEN Lehrstuhl für Genomorientierte Bioinformatik Towards Effective Biomedical Knowledge Discovery through Subject-Centric Semantic Integration of the Life-Science Information Space Karamfilka Krasimirova Nenova Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigten Dissertation. Vorsitzender: Prof. Dr. M. Hrabé de Angelis Prüfer der Dissertation: 1. Univ.-Prof. Dr. H.-W. Mewes 2. Univ.-Prof. Dr. R. Zimmer (Ludwig-Maximilians-Universität München) Die Dissertation wurde am 20.10.2008 bei der Technischen Universität München eingereicht und durch die Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt am 19.03.2009 angenommen. Acknowledgements This is a great opportunity to express my deep gratitude to many people who supported me during my Ph.D. research and in the writing process of this thesis. Without their help, guidance and encouragement, this study would not have been completed. Above all, I would like to thank Dr. Volker Stümpflen, who inspired me for the undertaken research. His professional competence, valuable advices and commitment were indispensable for my work.

Subjects

Informations

Published by
Published 01 January 2009
Reads 25
Language English
Document size 8 MB



TECHNISCHE UNIVERSITÄT MÜNCHEN

Lehrstuhl für Genomorientierte Bioinformatik



Towards
Effective Biomedical Knowledge Discovery through
Subject-Centric Semantic Integration of the
Life-Science Information Space


Karamfilka Krasimirova Nenova

Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für
Ernährung, Landnutzung und Umwelt der Technischen Universität München zur Erlangung
des akademischen Grades eines
Doktors der Naturwissenschaften
genehmigten Dissertation.
Vorsitzender: Prof. Dr. M. Hrabé de Angelis
Prüfer der Dissertation:
1. Univ.-Prof. Dr. H.-W. Mewes
2. Univ.-Prof. Dr. R. Zimmer
(Ludwig-Maximilians-Universität München)

Die Dissertation wurde am 20.10.2008 bei der Technischen Universität München eingereicht
und durch die Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung
und Umwelt am 19.03.2009 angenommen.






Acknowledgements
This is a great opportunity to express my deep gratitude to many people who supported me
during my Ph.D. research and in the writing process of this thesis. Without their help,
guidance and encouragement, this study would not have been completed.
Above all, I would like to thank Dr. Volker Stümpflen, who inspired me for the undertaken
research. His professional competence, valuable advices and commitment were indispensable
for my work. As a mentor and leader of the Biological Information Systems group at MIPS, he
has been always open-minded for constructive discussions and core contributor to the very
friendly atmosphere in the group. I am also very grateful to Prof. Dr. Hans-Werner Mewes
for giving me the opportunity to acquire doctor’s degree at the Institute for Bioinformatics
and Systems Biology. I highly appreciate his trust in me and my work and I am thankful for
his guidance in the key moments of my research, the unreserved support and the patience
during the writing process. I would like to extend these thanks to Prof. Dr. Inge Schestag, a
former study advisor at the University for Applied Sciences Darmstadt and a good friend, for
encouraging me to dare my Ph.D. study.
At this place, I would like to thank also past and present colleagues at MIPS for listening,
giving advices and helping me in the diverse topics regarding my work, in particular Dr .
Matthias Oesterheld, Richard Gregory, Thorsten Barnickel, Roland Arnold, Mara
Hartsperger, Florian Büttner, Octave Noubibou, Christoph Oberhauser, Thorsten Schmidt,
Sebastian Toepel, Dr. Thomas Rattei and Dr. Louise Gregory. Moreover, I have to thank Dr.
Ulrich Güldener, Dr. Martin Münsterkötter, Dr. Corinna Montrone, Dr. Irmtraud Dunger,
and Dr. Andreas Ruepp for sharing their biological expertise with me. Christian Chors and
Dr. Matthias Klaften from the Institute for Experimental Genetics I thank for the interest in
my work and the constructive feedbacks. Additionally, I want to thank Cornelia Canady,
Elisabeth Noheimer, Petra Fuhrmann, Gabi Kastenmueller, Wanseon Lee, Yu Wang and all
other institute members for the nice working atmosphere.
Last but not least, I would like to thank my family in Bulgaria for their concerns and
continuous encouragements, Annett Würfel, Markus and Julius Döbereiner for cheering me
up in the writing process, Tommaso Nuccio for helping me with the thesis revision, and a very
special thank goes out to Kuo-Sek Lee for his trust in me.
Karamfilka K. Nenova





Abstract
Key premises for successful life-science research are the access, combination, and
interpretation of already acquired knowledge. Within the last two decades, considerable data
regarding various biological research aspects has been generated and collected in a huge
number of diverse life-science information resources. Although most of the resources provide
public access on the Web to share the already gained knowledge, a vast knowledge gap has
emerged between the generated volume of information and discovered novel knowledge in
life-science. Not only because related biological information is commonly spread over several
distributed resources, but also because essential problems exist regarding context dependency
and differentiation of biological concepts and entities. Accordingly to bridge this crucial gap,
biological information has to be integrated and provided not only in a homogenous way, but
also in the right context for the more effective exploration and interpretation, which represents
still a demanding knowledge management task.
The objective of this thesis was the development of an integrative approach also applicable
for the life-science information space that allows a more effective knowledge discovery. The
generated solution follows a new paradigm for subject-centric knowledge representation,
which reflects the human way of associative thinking in terms of subjects and associations
between them. The novel integrative approach is realized by applying both state-of-the-art
technologies for dynamic information request and retrieval and also the semantic technology
Topic Maps. The designed approach was implemented within the software framework
GeKnowME (Generic Knowledge Modeling Environment), which supports scientists with
powerful tools for exploration and navigation through correlated biological entities to
accelerate the discovery process in a specific knowledge domain. The framework is generic
enough to be applicable for a broad range of use cases. To illustrate the potential of the
GeKnowME system, a sample use case called “Human Genetic Diseases” is introduced by
integrating distributed resources containing relevant information. The emerged coherent
information space is explored for novel insights.





Zusammenfassung
Die Forschung in den unterschiedlichen Biowissenschaften führte in den letzten zwei
Jahrzehnten zur Erhebung großer Datenmengen, die in einer Vielzahl von biologischen
Informationsressourcen gesammelt und gepflegt werden. Obwohl die meisten Ressourcen
öffentlich zugänglich sind und somit den Zugri f f auf bereits erworbene Erkenntnisse
ermöglichen, nimmt die Kluft zwischen diesen und dem daraus neu gewonnenen Wissen in
den Biowissenschaften stetig zu. Ursache hierfür ist einerseits, dass zusammenhängende
biologische Entitäten häufig über mehrere Ressourcen verteilt sind, und andererseits, das
Fehlen einer einheitlichen Repräsentation kontextabhängiger biologischer Konzepte und
Entitäten. Neben einer homogenen Integration der biologischen Information ist die
Einordnung dieser in den relevanten Kontext erforderlich um eine effektive Exploration und
Interpretation zu ermöglichen. Gerade in den Biowissenschaften stellt sich dies als eine
besondere Herausforderung für das Wissensmanagement dar.
Ziel dieser Arbeit war die Entwicklung und Umsetzung eines Konzepts für ein Verfahren,
welches für die Integration biologischer Wissensdomänen anwendbar ist und auf diese Weise
eine effektivere Erkenntnisgewinnung ermöglicht. Das entwickelte Konzept basiert auf einem
neuen subject-centric Ansatz zur Wissensrepräsentation, welcher das menschliche assoziative
Denken im Bezug auf Entitäten und deren Verbindungen abbildet. Sowohl aktuelle
Technologien zur dynamischen Informationsgewinnung als auch die semantische Technologie
Topic Maps wurden eingesetzt um diesen innovativen Integrationsansatz zu realisieren. Das
Software-Framework GeKnowME (Generic Knowledge Modeling Environment), welches
Wissenschaftlern leistungsfähige Werkzeuge für die Erforschung und Navigation durch
zusammenhängende biologische Entitäten in spezifischen Wissensdomänen bereithält, stellt
die Implementierung dieses Ansatzes dar. Das generische System eignet sich für ein breites
Spektrum an Anwendungsfällen. Die Leistungsfähigkeit von GeKnowME wird exemplarisch
am Anwendungsfall „Genetische Erkrankungen des Menschen“ aufgezeigt. Hierzu wurden
erforderliche verteilte Ressourcen integriert und das daraus entstandene Informationsnetzwerk
umfassend analysiert. Die Resultate erlauben neue Einblicke basierend auf bereits bekannter
Information.




Contents
1 Introduction .................................................................................................. 1
2 Knowledge Management in Life-Science ................................................... 3
2.1 Knowledge Hierarchy and Life Complexity ............................................................ 4
2.2 Li f e-Science Information Resources ....................................................................... 9
2.3 Integration of Life-Science Information Resources .............................................. 12
2.3.1 Challenges .................................................................................................. 13
Technical Challenges ................................................................................. 13
Conceptual Challenges ............................................................................... 15
2.3.2 Integrative Approaches .............................................................................. 16
Hypertext Linking ...................................................................................... 16
Full-Text Indexing ..................................................................................... 17
Data Warehousing ...................................................................................... 17
Federated Database Management Systems ................................................ 19
Peer Data Management Systems ................................................................ 21
2.4 Knowledge Representation .................................................................................... 23
2.4.1 Semantics – The Meaning of Meaning ...................................................... 23
2.4.2 Ontologies .................................................................................................. 25
Characteristics ............................................................................................ 26
Ontologies in Life-Science ......................................................................... 27
2.4.3 The World Wide Web ................................................................................ 29
A bit of History .......................................................................................... 30
The Semantic Web ..................................................................................... 33
Semantic Web Clashes Life-Science ......................................................... 37
Subject-Centric Computing ........................................................................ 38
Topic Maps ................................................................................................ 40


x Contents

3 GeKnowME ................................................................................................. 45
3.1 Use Case View ....................................................................................................... 47
3.2 Logical View ......................................................................................................... 50
3.2.1 Information Resources Layer ..................................................................... 52
3.2.2 Integration Layer ........................................................................................ 52
3.2.3 Syntax Layer .............................................................................................. 53
3.2.4 Semantic Layer .......................................................................................... 55
3.2.5 Presentation Layer ...................................................................................... 56
3.3 Physical View ........................................................................................................ 57
3.3.1 Technologies Used ..................................................................................... 57
Java Platform Enterprise Edition ............................................................... 58
Topic Maps ................................................................................................ 60
Portal and Portlets ...................................................................................... 60
Rich Internet Applications ......................................................................... 62
3.3.2 System Physical Overview ......................................................................... 63
3.4 Developmental View ............................................................................................. 66
3.4.1 Integration Package .................................................................................... 67
3.4.2 Syntax Package .......................................................................................... 70
3.4.3 Semantic Package ...................................................................................... 73
3.4.4 Presentation Package .................................................................................. 75
3.5 Process View ......................................................................................................... 79
3.5.1 Developmental Process .............................................................................. 79
3.5.2 Scientific Exploration Process ................................................................... 80
4 Applications ................................................................................................. 85
4.1 Human Genetic Diseases ....................................................................................... 87
4.2 Knowledge Domain Model .................................................................................... 91
4.3 La r g e-Scale Analysis ............................................................................................. 96
4.3.1 Extended Human Diseasome ..................................................................... 96
4.3.2 Recurrences of Proteins in Complexes ...................................................... 97
4.3.3 Protein Recurrence in Relation to Genetic Diseases .................................. 99
4.3.4 Protein Recurrence in Relation to Essentiality ........................................ 100
4.4 Mid-Scale Analysis .............................................................................................. 102
4.5 Small-Scale Analysis ........................................................................................... 105