332 Pages
English

Supporting IT Service Fault Recovery with an Automated Planning Method [Elektronische Ressource] / Feng Liu. Betreuer: Heinz-Gerd Hegering

-

Gain access to the library to view online
Learn more

Description

Supporting IT Service Fault Recoverywith an Automated Planning MethodDissertationan derFakultat fur Mathematik, Informatik und Statistik der Ludwig-Maximilians-Universitat Munc henvorgelegt vonFeng LiuTag der Einreichung: 21.04.2011Supporting IT Service Fault Recoverywith an Automated Planning MethodDissertationan derFakultat fur Mathematik, Informatik und Statistik der Ludwig-Maximilians-Universitat Munc henvorgelegt vonFeng LiuTag der Einreichung: 21.04.2011Tag der mundlic hen Prufung: 06.06.20111. Berichterstatter: Prof. Dr. Heinz-Gerd HegeringLudwig-Maximilians-Universitiat Munchen 2. Berichterstatter: Prof. Dr. Gabi Dreo RodosekUniversitat der Bundeswehr Munchen AcknowledgementThis dissertation not only presents the research results that I have achie-ved while working as a researcher at the chair of Prof. Dr. Dieter Kranzlmullerand Prof. Dr. Heinz-Gerd Hegering, it is also an evidence of friendship andsel ess help I got from the MNM Team during the completion of this work.First and foremost, I’d like to express my heartfelt appreciation to mythesis advisor Prof. Dr. Heinz-Gerd Hegering for his committed guidanceand kind advice during all phases of my work. As his doctoral student, Ihave deeply bene ted from his knowledge, deep-insight and decades-longexperience in the research area of network management. Also I would like tothank Prof. Dr.

Subjects

Informations

Published by
Published 01 January 2011
Reads 0
Language English
Document size 2 MB

Supporting IT Service Fault Recovery
with an Automated Planning Method
Dissertation
an der
Fakultat fur Mathematik, Informatik und Statistik der
Ludwig-Maximilians-Universitat Munc hen
vorgelegt von
Feng Liu
Tag der Einreichung: 21.04.2011Supporting IT Service Fault Recovery
with an Automated Planning Method
Dissertation
an der
Fakultat fur Mathematik, Informatik und Statistik der
Ludwig-Maximilians-Universitat Munc hen
vorgelegt von
Feng Liu
Tag der Einreichung: 21.04.2011
Tag der mundlic hen Prufung: 06.06.2011
1. Berichterstatter: Prof. Dr. Heinz-Gerd Hegering
Ludwig-Maximilians-Universitiat Munchen
2. Berichterstatter: Prof. Dr. Gabi Dreo Rodosek
Universitat der Bundeswehr Munchen Acknowledgement
This dissertation not only presents the research results that I have achie-
ved while working as a researcher at the chair of Prof. Dr. Dieter Kranzlmuller
and Prof. Dr. Heinz-Gerd Hegering, it is also an evidence of friendship and
sel ess help I got from the MNM Team during the completion of this work.
First and foremost, I’d like to express my heartfelt appreciation to my
thesis advisor Prof. Dr. Heinz-Gerd Hegering for his committed guidance
and kind advice during all phases of my work. As his doctoral student, I
have deeply bene ted from his knowledge, deep-insight and decades-long
experience in the research area of network management. Also I would like to
thank Prof. Dr. Gabi Dreo for the helpful discussions and important advice
during the preparation of this work. I’d like to thank Prof. Dr. Kranzlmuller
for his support during the nal phase of my research.
As a member of the MNM Team, I have been privileged to get countless
help, suggestions and support from all team members. Especially I want
to thank Dr. Vitalian Danciu and Dr. Michael Schi ers, whose suggestions
and advice helped me greatly in my research. I am indebted to you all, my
friends! Thank you for the experience and memory that I will cherish for
always.
I am grateful to my parents for their encouragement and their con dence
in me. I am also indebted to my wife and my daughter for their patience and
understanding during all phases of my work. Compared to what you have
done for me, a simple \thank you" seems to be pale to express my gratitude.
Feng Liu
Munich, July 2011Abstract
Despite advances in software and hardware technologies, faults are still
inevitable in a highly-dependent, human-engineered and administrated IT
environment. Given the critical role of IT services today, it is imperative
that faults, having once occurred, have to be dealt with e ciently and e ec-
tively to avoid or reduce the actual losses. Nevertheless, the complexities of
current IT services, e.g., with regard to their scales, heterogeneity and highly
dynamic infrastructures, make the recovery operation a challenging task for
operators. Such complexities will eventually outgrow the human capability
to manage them. Such di culty is augmented by the fact that there are few
well-devised methods available to support fault recovery.
To tackle this issue, this thesis aims at providing a computer-aided ap-
proach to assist operators with fault recovery planning and, consequently, to
increase the e ciency of recovery activities. We propose a generic framework
based on the automated planning theory to generate plans for recoveries of
IT services. At the heart of the framework is a planning component. Assi-
sted by the other participants in the framework, the planning component
aggregates the relevant information and computes recovery steps accordin-
gly. The main idea behind the planning component is to sustain the planning
operations with automated techniques, which is one of the research
elds of arti cial intelligence. Provided with a general planning model, we
show theoretically that the service fault recovery problem can be indeed sol-
ved by automated planning techniques. The relationship between a planning
problem and a fault recovery problem is shown by means of reduction bet-
ween these problems. After an extensive investigation, we choose a planning
paradigm that based on Hierarchical Task Networks (HTN) as the guideline
2for the design of our main planning algorithm called H MAP .
To sustain the operation of the planner, a set of components revolving
around the planning component is provided. These components are respon-
sible for tasks such as translation between di erent knowledge formats, per-
sistent storage of planning knowledge and communication with external sy-
stems. To ensure exibility in our design, we apply di erent design patterns
for the components. We sketch and discuss the technical aspects of imple-
mentations of the core components. Finally, as proof of the concept, the
framework is instantiated to two distinguishing application scenarios.Zusammenfassung
Trotz zahlreicher Fortschritte in Software- und Hardware-Technologien,
bilden Fehlersituationen nach wie vor einen festen Bestandteil des IT-Betriebs.
Angesichts der Wichtigkeit von IT-Diensten, ist es unerlasslich, die durch
Fehler verursachten Storungen unverzoglic h zu beheben, damit die betro e-
nen Dienste die erwarteten Funktionalitaten ohne erhebliche Unterbrechun-
gen weiter liefern konnen. Angesichts der Komplexitat, Dynamik und Hete-
rogenitat moderner IT Dienst-Landschaften, stellt die Planung der Wieder-
herstellung fehlerhafter Dienste eine gro e Management-Herausforderungen
dar. Zudem gibt es derzeit kaum wissenschaftlich fundierte Methoden oder
Werkzeuge, Administratoren bei der Wiederherstellung von Diensten zu un-
terstutzen.
Diese Arbeit liefert einen Beitrag zum Schlie en dieser L ucke durch den
Vorschlag und die prototypische Implementierung eines generischen Frame-
works fur das automatische Planen von Wiederherstellungsprozessen fur IT
Dienste. Das Herzstuc k des Frameworks bildet eine Komponente, die in der
Lage ist, einen Wiederherstellungsplan anhand der verfugbaren Informa-
tionen (sog. Recovery Knowledge) automatisch zusammenzustellen. Dazu
werden Techniken aus dem Automated Planning, einer Forschungsrichtung
der kunstlic hen Intelligenz, verwendet. In einem allgemeinen Planungsmo-
dell wird gezeigt, dass das Dienst-Wiederherstellungsproblem in ein Pla-
nungsproblem transformiert werden kann. Damit konnen Techniken, die
fur das Losen von allgemeinen Planungsproblemen adquat sind, auf die
Planung von Dienst-Wiederherstellungen anwendet werden. Wir entwerfen
einen Planungsalgorithmus, der auf dem Hierarchical Task Networks (HTN)
Planungsparadigma basiert. Im Vergleich zu anderen Paradigmen, ist das
HTN-basierte Modell exibler und e zienter.
Um den Einsatz der Planungskomponente zu unterstutzen, besitzt das
Framework eine Reihe weiterer Komponenten, die unter anderem dafur ver-
antwortlich sind, Planungswissen abzuspeichern, aktuelle Informationen bzgl.
der Infrastrukturen und Dienste zu akquirieren oder Informationsformate zu
transformieren. Die Planungskomponente kooperiert mit diesen Komponen-
ten, um die Planungsoperationen durchzufuhren bzw. die Losungsplane auf
den Zielsystemen auszufuhren.
Um die Flexibilitat zu gewahrleisten, verwenden wir im Frameworkde-
sign spezi sche Entwurfsmuster, damit Anderungen der Planungsumgebung
nur geringe Modi kationen nach sich ziehen. Zum Abschluss zeigen wir die
Tragfahigk eit und Einsetzbarkeit des Frameworks in zwei unterschiedlichen
Anwendungsbeispielen.