A logic-based approach to multimedia interpretation [Elektronische Ressource] / von Atila Kaya
206 Pages
English

A logic-based approach to multimedia interpretation [Elektronische Ressource] / von Atila Kaya

-

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description

A Logic-Based Approach toMultimedia InterpretationVom Promotionsausschuss derTechnischen Universit at Hamburg-Harburgzur Erlangung des akademischen GradesDoktor der Naturwissenschaften (Dr. rer. nat.)genehmigte DissertationvonAtila Kayaaus Izmir, Turk ei2011Reviewers:Prof. Dr. Ralf M ollerProf. Dr. Bernd NeumannProf. Dr. Rolf-Rainer GrigatDay of the defense:28.02.2011AbstractThe availability of metadata about the semantics of information in mul-timedia documents is crucial for building semantic applications that o erconvenient access to relevant information and services. In this work, wepresent a novel approach for the automatic generation of rich semanticmetadata based on surface-level information. For the extraction of therequired surface-level information state-of-the-art analysis tools are used.The approach exploits a logic-based formalism as the foundation for knowl-edge representation and reasoning. To develop a declarative approach, weformalize a multimedia interpretation algorithm that exploits formal infer-ence services o ered by a state-of-the-art reasoning engine. Furthermore,we present the semantic interpretation engine, a software system that im-plements the logic-based multimedia interpretation approach, and test itthrough experimental studies. We use the results of our tests to evaluatethe tness of our logic-based approach in practice. Finally, we conclude thiswork by highlighting promising areas for future work.

Subjects

Informations

Published by
Published 01 January 2010
Reads 34
Language English
Document size 3 MB
A LogicBased Approach to MultimediaInterpretation
Vom Promotionsausschuss der Technischen Universität HamburgHarburg zur Erlangung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.) genehmigte Dissertation
von
Atila Kaya
aus Izmir, Türkei
2011
Reviewers: Prof. Dr. Ralf Möller Prof. Dr. Bernd Neumann Prof. Dr. RolfRainer Grigat
Day of the defense: 28.02.2011
Abstract
The availability of metadata about the semantics of information in mul timedia documents is crucial for building semantic applications that offer convenient access to relevant information and services. In this work, we present a novel approach for the automatic generation of rich semantic metadata based on surfacelevel information. For the extraction of the required surfacelevel information stateoftheart analysis tools are used. The approach exploits a logicbased formalism as the foundation for knowl edge representation and reasoning. To develop a declarative approach, we formalize a multimedia interpretation algorithm that exploits formal infer ence services offered by a stateoftheart reasoning engine. Furthermore, we present the semantic interpretation engine, a software system that im plements the logicbased multimedia interpretation approach, and test it through experimental studies. We use the results of our tests to evaluate the fitness of our logicbased approach in practice. Finally, we conclude this work by highlighting promising areas for future work.
To my dear parents and wife Sevgili anneme, babama ve esime .
i
Acknowledgements
This thesis is the result of five years work in the Institute for Software Systems (STS) research group at the Hamburg University of Technology (TUHH). I am grateful to my advisor Prof. Dr. Ralf Möller for giving me the opportunity to conduct such exciting research and mentoring me. I would also like to thank Prof. Dr. Bernd Neumann and Prof. Dr. Rolf Rainer Grigat for reviewing this work.
I would like to express my gratitude to all my colleagues at the STS re search group: Sofia Espinosa, Sylvia Melzer, Alissa Kaplunova, Tobias Näth, Kamil Sokolski, Maurice Rosenfeld, Oliver Gries, Anahita Nafissi, Dr. HansWerner Sehring, Olaf Bauer, Rainer Marrone, Sebastian Wan delt, Volker Menrad and Gustav Munkby. Special thanks go to Dr. Patrick Hupe and Dr. Michael Wessel, who always supported and encouraged me.
I am also indebted to STS staff Hartmut Gau, Ulrike Hantschmann, Thomas Rahmlow, Thomas Sidow for their excellent administrative and technical support.
Finally, I would like to thank my parents Tükez and Dursun, and my wife Justyna for their love, care and continuous support.
ii
Contents
List of Figures
1
2
3
Introduction 1.1 Motivation for this Research . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Dissemination Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Outline of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . .
Logical Formalization of Multimedia Interpretation 2.1 Applications and Related Research Fields . . . . . . . . . . . . . . . . . 2.2 Related Work On Image Interpretation . . . . . . . . . . . . . . . . . . . 2.2.1 Image Interpretation Based on Model Generation . . . . . . . . . 2.2.2 Image Interpretation Based on Abduction . . . . . . . . . . . . . 2.2.3 Image Interpretation Based on Deduction . . . . . . . . . . . . . 2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Logical Engineering of a Multimedia Interpretation System 3.1 Knowledge Representation Formalisms . . . . . . . . . . . . . . . . . . . 3.1.1 Introduction to Description Logics . . . . . . . . . . . . . . . . . 3.1.2 Introduction to Logic Programming . . . . . . . . . . . . . . . . 3.2 Overview of a Multimedia Interpretation System . . . . . . . . . . . . . 3.3 Formalizing ABox Abduction . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Related Work on Abduction . . . . . . . . . . . . . . . . . . . . . 3.3.2 The ABox Abduction Algorithm . . . . . . . . . . . . . . . . . . 3.3.3 Selecting Preferred Explanations . . . . . . . . . . . . . . . . . .
iii
v
3 3 4 5 6 9
11 12 16 17 20 25 30
33 35 38 54 59 66 68 83 89
4
5
6
3.4 3.5
AbductionBased Interpretation . . . . . . . . . . . . . . . . . . . . . . . Fusion of ModalitySpecific Interpretations . . . . . . . . . . . . . . . . .
95 99
Case Studies 105 4.1 The BOEMIE Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.2 The Semantic Interpretation Engine . . . . . . . . . . . . . . . . . . . . 110 4.3 Interpretation of a Sample Multimedia Document . . . . . . . . . . . . . 113 4.3.1 ModalitySpecific Interpretations . . . . . . . . . . . . . . . . . . 114 4.3.2 Strategies for the Interpretation Process . . . . . . . . . . . . . . 138 4.3.3 Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Evaluation 161 5.1 Performance and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.2 Quality of Interpretation Results . . . . . . . . . . . . . . . . . . . . . . 168
Conclusions 175 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
References
Index
179
194
List
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11
4.1
4.2 4.3
of
Figures
The hybrid approach for obtaining deep semantic annotations . . . . . . Interpretation of complex concept descriptions . . . . . . . . . . . . . . A graphical representation of the concept definitionP erson, which re quires modeling of a triangular structure . . . . . . . . . . . . . . . . . . A graphical representation of an ABox with an inferred role assertion (dashed) caused by the transitive role R . . . . . . . . . . . . . . . . . . An example UML class diagram . . . . . . . . . . . . . . . . . . . . . . . An example TBoxT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The multimedia interpretation process. Input: analysis ABox, Output: interpretation ABox(es), The background knowledge: Domain ontology and interpretation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpretation of a document consisting of observations and their expla nations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The multimedia interpretation approach including processing steps for analysis, interpretation and fusion . . . . . . . . . . . . . . . . . . . . . A rule used by the Wimp3 system for network construction . . . . . . . The Bayesian network constructed for plan recognition . . . . . . . . . .
34 40
50
51 52 53
60
62
64 73 74
The architecture of the semantic interpretation engine, which is deployed into the Apache Tomcat servlet container. The Apache Axis is a core engine for web services. The semantic interpretation engine exploits the inference services offered by RacerPro. Each RacerPro instance is dedicated to a single modality. . . . . . . . . . . . . . . . . . . . . . . . 111 A sample web page with athletics news . . . . . . . . . . . . . . . . . . . 115 The image taken from the sample web page in Figure 4.2 . . . . . . . . 116
v
4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22 4.23 4.24 4.25 4.26 4.27 4.28
The ABoximageABox01representing the results of image analysis for the image in Figure 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 An excerpt of the TBoxTfor the athletics domain . . . . . . . . . . . . 117 An excerpt of the image interpretation rulesRimafor the athletics domain117 The ABoxAafter the addition of Δ1. . . . . . . . . . . . . . . . . . . 120 The interpretation ABoxesimageABox01 interpretation1andimageABox01 interpretation2 returned by the semantic interpretation engine . . . . . . . . . . . . . . 123 The caption of the image shown in Figure 4.3 . . . . . . . . . . . . . . . 123 The ABoxcaptionABox01representing the results of text analysis for the caption in Figure 4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Another excerpt of the TBoxTfor the athletics domain . . . . . . . . . 125 An excerpt of the caption interpretation rulesRcapfor the athletics domain125 The interpretation ABoxcaptionABox01 interpretation1returned by the semantic interpretation engine . . . . . . . . . . . . . . . . . . . . . . . . 129 The first paragraph of the text segment of the sample web page . . . . . 129 The ABoxtextABox01representing the results of text analysis for the text segment in Figure 4.14 . . . . . . . . . . . . . . . . . . . . . . . . . 130 Another excerpt of the TBoxTfor the athletics domain . . . . . . . . . 131 An excerpt of the text interpretation rulesRtex131for the athletics domain The ABoxAafter the addition of the explanation Δ2134. . . . . . . . . . The interpretation ABoxtextABox01 interpretation1returned by the semantic interpretation engine . . . . . . . . . . . . . . . . . . . . . . . . 137 The ABoxsampleABox1. . . . . . . . . . . . . . . . . . . . . . . . . . . 139 A sample TBoxT140. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A set of text interpretation rulesR1. . . . . . . . . . . . . . . . . . . . 140 Two possible interpretation results for the same analysis ABoxsam pleABox1141, where the one on the lefthand side is preferred . . . . . . . . The ABoxsampleABox2142. . . . . . . . . . . . . . . . . . . . . . . . . . . A set of text interpretation rulesR2containing a single rule . . . . . . . 142 Two different interpretation results for the analysis ABoxsampleABox2, where the one on the lefthand side is preferred . . . . . . . . . . . . . . 144 The sample analysis ABoxsampleABox3145. . . . . . . . . . . . . . . . . A set of text interpretation rulesR3145. . . . . . . . . . . . . . . . . . . .
4.29 4.30 4.31 4.32 4.33 4.34 4.35
5.1
5.2
5.3
5.4
Two different interpretation results for the analysis ABoxsampleABox3, where the one on the lefthand side is preferred . . . . . . . . . . . . . . 146 An excerpt of the axioms, which are added to the background knowledgeT149 All assertions of the interpretation ABoxcaptionABox01 interpretation1 as returned by the semantic interpretation engine . . . . . . . . . . . . . 152 The analysis ABox of a sample web page . . . . . . . . . . . . . . . . . . 156 A sample image interpretation ABox . . . . . . . . . . . . . . . . . . . . 156 A sample caption interpretation ABox . . . . . . . . . . . . . . . . . . . 157 The fused interpretation ABox of the sample web page . . . . . . . . . . 160
The number of fiat assertions (x) and the time (y) spent in minutes for the interpretation of 500 text analysis ABoxes. . . . . . . . . . . . . . . 164 The number of fiat assertions (x) and the time (y) spent in minutes for the interpretation of selected text analysis ABoxes. . . . . . . . . . . . . 165 The sum of fiat and bona fide assertions (x) and the time (y) spent in minutes for the interpretation of 500 text analysis ABoxes. . . . . . . . 166 The number of fiat and bona fide assertions (x) and the time (y) spent in minutes for the interpretation of selected text analysis ABoxes. . . . . 168