167 Pages

Fast 3D object detection and pose estimation for augmented reality systems [Elektronische Ressource] / Seyed Hesameddin Najafi Shoushtari

Gain access to the library to view online
Learn more


Published by
Published 01 January 2006
Reads 18
Language English
Document size 38 MB

Fast 3D Object Detection and Pose Estimation
for Augmented Reality Systems




Fast 3D Object Detection and Pose Estimation
for Augmented Reality Systems


Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Uni-
versität München zur Erlangung des akademischen Grades eines

Doktors der Naturwissenschaften (Dr. rer. nat.)

genehmigten Dissertation.

Vorsitzender: Univ.-Prof. Dr. Darius Burschka

Prüfer der Dissertation: 1. Univ.-Prof. Gudrun J. Klinker, Ph.D.
2. Univ.-Prof. Nassir Navab, Ph.D.

Die Dissertation wurde am 21. September 2006 bei der Technischen Universität München
eingereicht und durch die Fakultät für Informatik am 7. Dezember 2006 angenommen.
Many problems in computer vision and augmented reality (AR) require the estimation
of the pose of objects or mobile users in real-time. While reliable solutions have been
proposed for pose estimation given correspondences and feature-based 3D tracking, fast
and fully automated initialization for tracking, i.e. estimation of the initial pose is still an
open problem. The difficulty stems from the need for fast and robust detection of known
objects in the scene. This dissertation presents a fast and automated object detection
and pose estimation system capable of working with large amounts of background clutter,
severe occlusions, and strong viewpoint and scale changes.
The thesis builds upon existing algorithms introduced in the last few years and devel-
ops various novel techniques which significantly improve both time and functional perfor-
mance for detection and pose estimation. The advances can be summarized as two main
First, a method for a scalable, statistical and compact representation of the object
of interest is introduced based on fusion of 3D geometric and appearance information.
This is achieved during an offline learning process by a statistical analysis and evaluation
of distinctive features of the target object. This representation is then used at run-time
during the matching and pose estimation processes to limit the number of hypothesis by
incorporating both photometric and 3D geometric consistency constraints. This allows
to reduce the effect of the complexity of the 3D model on the run-time performance and
makes the method especially for large environments very powerful.
based on a coarse to fine strategy capable of incorporating multiple sensors, e.g. mobile
and stationary cameras. This relies on a statistical analysis, probabilistic estimation and
fusion of the uncertainties of the sensors.
Furthermore, this system is integrated into an AR tracking framework for the initial-
ization of a marker-less real-time tracking system, which proves to be fast and reliable
enough for industrial AR applications.
Viele Probleme in der Computer Vision und erweiterter Realit¨at (AR) ben¨otigen die
Berechnung derPositionundOrientierung vonObjektenodermobilen BenutzerninEcht-
zeit. W¨ahrendhierfur¨ vielepr¨aziseTrackingsystemeindenletztenJahrenhinwegentwick-
elt wurden, bleibt schnelle und v¨ollig automatisierte Initialisierung und Reinitialisierung
der Systeme noch ein offenes Problem. Die Hauptschwierigkeit is eine schnelle und ro-
buste Detektion der schon bekannten Objekte aus der Szene. Diese Dissertation stellt
ein schnelles, v¨ollig automatisiertes und vor allem skalierbares Objekdetektionssystem
vor, welches im Stande ist, mit starken Blickwinkel- und Skalierungsan¨ derungen, partielle
Verdeckungen und grossen Mengen von Hintergrundclutter zurecht zu kommen.
Die Arbeit baut auf bereits vorhandene Algorithmen auf, die in den letzten Jahren
eingefu¨hrt wurden, und schl¨agt verschiedene neue Methoden vor, die sowohl die Zeit-,
als auch die funktionelle Performance entscheidend verbessern. Die Weiterentwicklungen
beinhalten unter anderem eine skalierbare und statistische Objektrepr¨asentation, die auf
einer Fusion der 3-dimnsionalen Form und Erscheinungsmodelle basiert, ein Sensorfusion-
ssystem, das verschiende Sensoren integriert, und ein Tracking Management System, das
gezeigt hat, dass es fu¨r industrielle AR Anwendungen schnell und verl¨asslich genug ist.
First and foremost, I wish to thank my supervisors Prof. Gudrun Klinker, Ph.D. and
Prof. Dr. NassirNavabwhohavebothcontinuouslysharedtheirsupportandenthusiasm.
Workingwiththemhasbeenagreatpleasure. IgratefullyacknowledgeGudrun’sconstant
and invaluable academic and personal support and guidance throughout the last years. I
thank Nassir for sharing with me his unique way of looking at things and approaching a
research problem. He has been a constant source of inspiration during this endeavor.
I have spent the last year and a half of my Ph.D. working at Siemens Corporate
Research Center in Princeton, NJ. I am thankful to Yakup Genc, Ph.D. for giving me the
opportunity to work inthe 3D Visionand Augmented Reality Group. I greatly appreciate
his insightful comments, his constructive criticism and his valuable suggestions. I also
wish to thank Dr. Visvanathan Ramesh for the fruitful discussions during the course of
this work. I have enjoyed many useful discussions with some of my present and former
colleagues at the Real-time Vision Department, in particular, Yanghai Tsin, Ph.D., Xiang
Zhang, Ph.D., Matthias Voigt, Anurag Mittal, Ph.D., Jan Neumann, Ph.D., Ying Zhu,
Ph.D., Maneesh Singh, Ph.D., Dr. Claus Bahlmann, and Ali Khamene, Ph.D. I thank my
colleague Mehdi Hamadou from Siemens A&D for the very interesting conversations and
his valuable feedbacks and support in the ARTESAS project.
I would like to say a big ’thank-you’ to all my former colleagues from the Augmented
Reality Research Group at Technische Universit¨at Mu¨nchen(TUM), in particular, Martin
Bauer, Verena Broy, Stephan Huber, Dr. Asa MacWilliams, Dr. Thomas Reicher, Dr.
Christian Sandor, and Dr. Martin Wagner. It has been a great pleasure for me to work
with you and I wish you all the best for your academic and professional pursuits. I also
thank Prof. Bernd Brugge,¨ Ph.D., for providing the opportunity to work at his chair.
Special thanks to all the students I have been working with at TUM, in particular,
Stefan Hinterstoisser, Fabian Sturm, Christian Tru¨bswetter, Franz Mader, Andreas Haug,
and Musafa Isic.
A great thank you to my friends from the CAMP+AR Group at TUM, Dr. Selim
Martin Horn, Daniel Pustka, Tobias Sielhorst, and Wolfgang Wein.
There are many other people that I will not list, with whom I have had useful and
inspiring conversations about my work and related topics.
During the course of this work, I have been supported by a scholarship funded from
Siemens Corporate Research. I gratefully acknowledge this support.
This thesis is, if indirectly, the fruit of my parents. I would like to thank them for
their endless patience, continual support, and unflagging empathy and encouragement.
My mother has always supported my dreams and aspirations. I would like to thank her
vfor all she is, and all she has done for me. Finally, my heartfelt thanks go to my father,
Dr. S.M. Bagher Najafi-S. who gave everything for his children and their education and
always encouraged them to pursue the quest for careers in science. He lives in my heart
forever. I dedicate this thesis to him.