163 Pages
English

Automated design of efficient fail-safe fault tolerance [Elektronische Ressource] / vorgelegt von Arshad Jhumka

Gain access to the library to view online
Learn more

Description

Automated Design of Ecient Fail-Safe FaultToleranceVom Fachbereich Informatik der Technischen Universit at DarmstadtgenehmigteDissertationzur Erlangung des akademischen Grades eines Doctor rerum naturalium(Dr.rer.nat) vorgelegt vonArshad Jhumkaaus Rose-Hill, MauritiusReferenten:Prof. Dr. Neeraj SuriProf. Dr. Mario Dal CinDatum der Einreichung: 10.09.2003Datum der mundlic hen Prufung: 10.11.2003Darmst adter Dissertationen D17ErklarungHiermit erkl are ich, die vorgelegte Arbeit zur Erlangung des akademischenGrades “Dr.rer.nat” mit dem Titel “Automated Design of E cient Fail-safe Fault Tolerance” selbst andig und ausschliesslich unter Verwendung derangegebenen Hilfsmittel erstellt zu haben. Ich habe bisher noch keinen Pro-motionsversuch unternommen.Darmstadt, February 11, 2004 Arshad JhumkaAbstractBoth the scale and the reach of computer systems and embedded deviceshave been constantly increasing over the last decade. As such computer sys-tems become pervasive, our reliance on such systems increases, resulting inour expectation for such systems to continuously deliver services, even in thepresence of faults, that is we expect the computer systems to be dependable.One way to ensure the continuous delivery of dependable services is repli-cation, which however, is expensive, so we focus on the cheaper alternative,that of software-based fault tolerance.

Subjects

Informations

Published by
Published 01 January 2004
Reads 26
Language English

AutomatedDesignofEfficientFail-SafeFault
leranceoT

VomFachbereichInformatikderTechnischenUniversit¨atDarmstadt
tegenehmig

ionDissertat

zurErlangungdesakademischenGradeseinesDoctorrerumnaturalium
(Dr.rer.nat)vorgelegtvon

aumkJhArshad

ausRose-Hill,Mauritius

ten:ReferenSurijNeeraDr.Prof.Prof.Dr.MarioDalCin

DatumderEinreichung:10.09.2003
Datumderm¨undlichenPr¨ufung:10.11.2003

Darmst¨adterDissertationenD17

arung¨Erkl

Hiermiterkl¨areich,dievorgelegteArbeitzurErlangungdesakademischen

Grades“Dr.rer.nat”mitdemTitel“AutomatedDesignofEfficientFail-

safeFaultTolerance”selbst¨andigundausschliesslichunterVerwendungder

angegebenenHilfsmittelerstelltzu

motionsversuchunternommen.

Darmstadt,February

1,1

4200

ebhan.

hIc

ehab

rbishe

nochkeinenPro-

Arshad

aumkJh

acttrAbs

Boththescaleandthereachofcomputersystemsandembeddeddevices
havebeenconstantlyincreasingoverthelastdecade.Assuchcomputersys-
temsbecomepervasive,ourrelianceonsuchsystemsincreases,resultingin
ourexpectationforsuchsystemstocontinuouslydeliverservices,eveninthe
presenceoffaults,thatisweexpectthecomputersystemstobedependable.
Onewaytoensurethecontinuousdeliveryofdependableservicesisrepli-
cation,whichhowever,isexpensive,sowefocusonthecheaperalternative,
thatofsoftware-basedfaulttolerance.
Therearedifferentlevelsoffaulttolerancethatcanbeprovided,forex-
amplemaskingfaulttolerance,fail-safefaulttoleranceetc.Inthisthesis,
wefocusonprovidingfail-safefaulttolerance.Intuitively,afail-safefault-
tolerantprogramisonewhereitisacceptableforsuchaprogramto“halt”
whenfaultsoccur,aslongasitalwaysremainsina“safe”state.Moreover,
weendeavortosynthesizeefficientfail-safefaulttolerance.Weusedtwo
commonly-usedcriteriatoassesstheefficiencyofafail-safefault-tolerant
program,namely(i)errordetectionlatency–orlatencyforshort–,i.e.,how
fastcanafail-safefault-tolerantprogramdetectanerroneousstate,and(ii)
errordetectioncoverage–orcoverageforshort,i.e.,theratioof“harmful”
errorstheprogramcandetect.
Inthisthesis,wepresentaformalframeworkforthedesignofefficient
fail-safefault-tolerantprogram.Theframeworkisbasedonarefinedtheory
ofdetectors,whichintroducesnovelinsightsintotheirworkingprinciples.
Weintroducetheconceptofaperfectdetector,whichallowsafail-safefault-
tolerantprogramtohaveperfectdetection.Thismeansthataprogram,
composedwithperfectdetectors,haveoptimaldetectioncoverage.Optimal
inthesensethatthedetectorsdetectallofthe“harmful”errors,andmakeno
mistakes.Then,wepresenttheconceptoffastdetection,andshowhowafail-
safefault-tolerantprogramcanhavebothperfect,andfasterrordetection.In
fact,thedetectionlatencyisshowntobeminimal,i.e.,theerrorisdetected

i

in0-step.Basedonthesetwobasicnotions,wepresentalgorithmsthat
automaticallyaddfail-safefaulttolerancewithperfectdetectiononly,and
fail-safefaulttolerancewithperfectdetection,andminimaldetectionlatency.
Wefurtherdevelopatheoryforthedesignofmultitolerance,whichisthe
abilityofaprogramtotoleratemultipleclassesoffaults.Inthethesis,we
explainthatinterferencecanoccurbetweendifferentprogramcomponents
whendesigningmultitolerance,andwepresentasetofnon-interferencecon-
ditionsthatneedstobeverified.Wethenpresenttwodifferentapproaches
forthedesignofmultitolerance,andforeachapproach,wepresenttwodiffer-
entalgorithmsthataddfail-safefaulttolerancetoseveralfaultclasseswith
differentefficiencyproperties.
Thealgorithmspresentedinthisthesisareparticularlysuitableforaclass
ofprogramstermedasboundedprograms.Thepropertyofboundedprograms
isthattheydonothaveanykindofunboundedloopingstructure.

Keywords:DistributedSystems,EmbeddedSystems,FormalMethods,
FaultTolerance,Fail-Safe,Detectors,Efficiency,Multitolerance.

ii

zfassungKurIndenletztenJahrendurchdringenimmerkleinere,eingebetteteComputer-
systemeverst¨arktunsereLebensumwelt.MitderAllgegenw¨artigkeitsolcher
Systemewerdenwiraberauchimmerabh¨angigervonihnen.Mitder
Abh¨angigkeitsteigenunsereErwartungenandieZuverl¨assigkeitderSysteme
biszudemPunkt,andemwirunsw¨unschen,dassdasSystemauchdann
nochfunktioniert,wenneinbestimmtesMaßanFehlverhalteninnerhalbder
Systemkomponentenauftritt.DerartigeSystemewerdenalsfehlertolerant
(fault-tolerant)oderverl¨aßlich(dependable)bezeichnet.EineM¨oglichkeit,
verl¨asslicheComputersystemezubauen,bestehtdarin,dasSystemmehrfach
vorzuhalten,umimFehlerfallaufeinfunktionsf¨ahigesSystemumschaltenzu
k¨onnen.EineweitereundinderPraxish¨aufigg¨unstigereAlternativeistdie
sogenanntesoftware-basierteFehlertoleranz,umdieesindieserArbeitgeht.
ManunterscheidetverschiedeneArtenvonFehlertoleranz,beispielsweise
diebekanntemaskierendeFehlertoleranz(maskingfaulttolerance).Indieser
Arbeitgehtesumdiesogenanntefail-safeFehlertoleranz(fail-safefault
tolerance).Beifail-safeFehlertoleranzistesakzeptabel,wenndasSys-
temimFehlerfall“anh¨alt”anstattweiterhinseinenDienstzuerbringen.
Wichtigistlediglich,dassdasSystemimmerineinem“sicheren”Zustand
verweilt.IndieserArbeitwerdenVerfahrenvorgeschlagen,umeffiziente
fail-safe-fehlertoleranteSystemeausfehler-intoleranteOriginalsystemenzu
synthetisieren.WirverwendenzweibekannteKriterien,umdieEffizienz
derfehlertolerantenSystemezumessen:(1)Fehlererkennungszeit(errorde-
tectionlatency),alsodie“Zeit”,dieben¨otigtwird,umeinenaufgetretenen
Fehlerzuentdecken,und(2)Fehlererkennungsabdeckung(errordetectioncov-
erage),alsodieRatevonrelevantenFehlern,diedasProgrammentdecken
ann.kDieseArbeitlegtdieformalenGrundlagenf¨urdenEntwurfvoneffizien-
tenfail-safe-fehlertolerantenSystemen.GrundlageisteineverfeinerteThe-
oriesogenannterDetektoren(detectors).WirdefiniereneineneueKlasse
vonDetektoren,perfekteDetektoren(perfectdetectors),dieeserlauben,
fail-safe-fehlertoleranteSystememitperfekterFehlerekennung(perfectdetec-
tion)unddamitvollst¨andigerFehlererkennungsabdeckungzusynthetisieren.

iii

AnschliessenddefinierenwirdasKonzeptderschnellenFehlererkennung
(fastdetection),welcheseineoptimaleFehlererkennungszeiterlaubt.Wir
stelleneinVerfahrenvor,wiemaneinfail-safe-fehlertolerantesSystem
sowohlmitperfekterFehlererkennungalsauchmitschnellerFehlererken-
nungsynthetisierenkann.Dar¨uberhinausistdieFehlererkennungszeitder
synthetisiertenProgrammeistoptimal.
DieArbeitbesch¨aftigtsichabschließendmitdemKonzeptderMul-
titoleranz(multitolerance),alsoderF¨ahigkeiteinesProgrammes,ver-
schiedeneFehlerklassengleichzeitigzutolerieren.BeiMultitoleranzkann
eszueinerwechselseitigenBeeinflussung(interference)derDetektorenf¨ur
verschiedeneFehlerklassenkommen.WirstelleneineReihevonNicht-
Beeinflussungskriterien(non-interferenceconditions)vor,die¨uberpr¨uftwer-
denm¨ussen,umMultitoleranzzugewa¨hrleisten.WirstellenzweiAns¨atzef¨ur
denEntwurfmultitoleranterProgrammevor.F¨urjedenAnsatzgebenwir
zweiverschiedeneAlgorithmenan,diefail-safe-fehlertoleranteProgramme
bez¨uglichverschiedenerFehlerklassenundmitunterschiedlicherEffizienzsyn-
thetisieren.TheAlgorithmendieserArbeiteignensichinsbesonderef¨ursogenannte
beschr¨ankteProgramme(boundedprograms).Beschr¨ankteProgrammesind
Programmeohneunbeschr¨ankteSchleifenstrukturen.

iv

v

vi

Acknowledgements

Iwouldprobablynothavemadethisfarwithoutthehelp,guidanceand
supportofmanypeople.
IwishtothankProf.Dr.NeerajSuri,myadivsor,forfirstacceptingme
inhisDEEDSgroup,andthenguidingmetowhereIamnow.
Mygratitudetomyfriend,andcolleagueMartinHillerisenormous.Ever
sincewestartedourcollaborationafewyearsback,Martinhasbeenacon-
stantsourceofinspiration,andhelp.
MythanksalsogotoVilgotClaesson,¨OrjanAskerdal,RobertLindstr¨om,
Andr´easJohansson,andAdinaSarbu,whoareallpartoftheDEEDSgroup.
Theyhaveallcontributedtocreateagreatworkingatmosphere.Discus-
sionswithAndreasandRobertonvariouscomputingtopicshavebeenvery
interesting,ashavebeenournumerousdiscussionsonfootball.
IcannotthankBirgitNeutheoftheDEEDSgroupenoughforallthe
helpsheaffordedmewiththeeverydaystuff.FromhelpingmewithGerman
translationtosettingupappointments,she’sbeenoftremendoushelp.
OutsideoftheDEEDSgroup,IhavetothankFelixG¨artnerforallthe
interesting,andchallengingdiscussionswe’vehadoverthetimes.Ialsohave
tothankChristoffetzerforalltheinterestingdiscussionswehavehad.My
thanksalsogotoSandeepKulkarni,fortakingtimetoclarifyandexplain
partofthetheoryhedeveloped.
IhavetithankProf.Dr.MarioDalCinforkindlyacceptingtobea
reviewerofmythesis,aswellasProf.Dr.AlexBuchmann,Prof.Dr.Mira
Mezini,andProf.Dr.ClaudiaEckertforkindlytakingtimetoreadmythesis,
andacceptingtobeonmycommittee.
Mysincere,heartfeltthanksgotomyparents,whohavesparednoeffort
intryingtogetmetowhereIamnow.Theirconstantmoralsupporthas

vii

beenverywelcoming.

support,aswellas

hetvingSa

ort,supp

uroy

estb

e,vlo

thankalsoI

ws.in-laym

ym

otherbr

forlast:Najaat,thankyou

oury

nce,iepat

.erythingve

viii

and

so

sister

hucm

for

for

their

tantnsco

ything,reev

your

ix

x

Contents

1Introduction
1.1Dependability:BasicConcepts..................
1.1.1Faults,Errors,andFailures:..............
1.1.2WaysofAchievingDependability............
1.1.3AttributesofDependability...............
1.1.4DesignofFaultTolerance................
1.1.5VerificationandValidationofFaultTolerance.....
1.2MotivationandResearchQuestions...............
1.2.1ProblemStatements...................
1.3ResearchContributions......................
1.4ThesisStructure..........................

2RelatedWork
2.1DesignofFault-TolerantPrograms...............
2.2AutomatedProcedures......................

3FormalPreliminaries
3.1ConcurrentSystems........................
3.2Programs.............................
3.3Communication..........................
3.4Specifications...........................
3.5TemporalLogic..........................
3.6Refinement............................
3.7FaultModelsandFaultTolerance................

xi

1223345680121

134171

1991022223525262

4

5

6

PerfectDetectors:BasisforPerfectDetection29
4.1Introduction............................31
4.2AnOverviewofDetectors....................33
4.2.1RoleofDetectorsinFail-SafeFaultTolerance.....34
4.3TheTransformationProblem..................38
4.4ATheoryofPerfectDetectors..................39
4.4.1TransitionConsistencyintheContextofSafetySpec-
ifications..........................40
4.4.2PerfectDetectors.....................46
4.4.3ConstructingPerfectDetectors.............53
4.5AnAlgorithmforPerfectDetectors...............54
4.6ThreeCaseStudies........................57
4.6.1ASimpleExample....................57
4.6.2AMajorityVoterSystem.................58
4.6.3TokenRing........................60
4.7ChapterSummary........................61

FastDetectors:ABasisforFastErrorDetection63
5.1Introduction............................64
5.2FastErrorDetection.......................66
5.3TheTransformationProblemforFastandPerfectDetection.71
5.4AddingEfficientFail-SafeFaultTolerance...........72
5.5TwoCaseStudies.........................75
5.5.1ASimpleExample....................75
5.5.2AMajorityVoterSystem.................76
5.6Discussion.............................78
5.7ChapterSummary........................80

DesignofEfficientMultitolerance81
6.1Introduction............................83
6.2IssuesinMultitoleranceDesign.................86
6.3One-at-a-timeDesignofMultitolerance.............89
6.3.1MultitolerantProgramsWithPerfectDetection....89
6.3.2ASimpleExample....................97

xii

7

6.3.3TokenRing........................98
6.3.4MultitolerantProgramsWithPerfectDetectionand
MinimalDetectionLatency...............102
6.3.5ASimpleExample....................110
6.4All-at-a-timeDesignofMultitolerance.............113
6.4.1MultitolerancewithPerfectDetection..........113
6.4.2ASimpleExample....................116
6.4.3MultitolerancewithPerfectDetectionandminimalde-
tectionlatency......................117
6.4.4ASimpleExample....................119
6.5ChapterSummary........................120

ConclusionandFutureWork123
7.1Discussion.............................124
7.2SummaryofResearchContributions..............129
7.2.1PerfectDetection.....................129
7.2.2FastDetection.......................130
7.2.3DesignofOne-at-a-timeMultitolerance.........130
7.2.4DesignofAll-at-a-timeMultitolerance.........132
7.3Impact...............................132
7.4FutureWork............................133

xiii

xiv

ListofFigures

4.14.24.34.44.54.64.7

5.15.25.3

6.16.26.36.4

Reachablestates/transitions...................37
Anexampletoillustratetheconceptofinconsistenttransition40
ProgramtoillustratetheconceptofSS-inconsistenttransitions.41
Programcontainingtwoconcurrentprocesseswithatransition
thatisbothSS-inconsistentandnotSS-inconsistentw.r.t.two
differentcomputations.......................43
Algorithmtosynthesizefail-safefault-tolerantprogramwith
perfectdetection..........................55
Exampleprogrampinthepresenceoffaults..........57
Fail-safefault-tolerantprogrampobtainedbyremovingss..58

Algorithmtoaddefficientfail-safefault-tolerance........73
Anexampletoillustratehowalgorithmadd-efficient-fail-safe
works................................75
Fail-safefault-tolerantprogramresultingfromapplyingalgo-
rithmadd-efficient-fail-safe...................77

Thefirststepinthedesignofmultitolerantprogramswith
perfectdetection..........................91
Thesecondstepinthedesignofmultitolerantprogramswith
perfectdetection..........................92
thThekstepinthedesignofmultitolerantprogramswithper-
fectdetection............................94
Thealgorithmaddsfail-safefaulttolerancetonfaultclasses,
withperfectdetectiontoeveryfaultclass............96

xv

6.5Fault-intolerantprograminthepresenceofF1–firstiteration
ofthealgorithm..........................97
6.6Resultingfail-safefault-tolerantprogramp1toF1.......98
6.7Resultingfail-safefault-tolerantprogramp1inpresenceofF2.98
6.8Resultingfail-safemultitolerantprogramp2toF1andF2with
perfectdetectiontobothfaultclasses..............99
6.9Thefirststepinthedesignofmultitolerantprogramswith
perfectdetectionandminimallatency..............103
6.10Thesecondstepinthedesignofmultitolerantprogramswith
perfectdetectionandminimallatency..............105
6.11Algorithmadd-efficient-fail-safe-multitoleranceaddsfail-safe
faulttolerancetonfaultclasses,withperfectdetection,and
minimaldetectionlatencytoeveryfaultclass.........110
6.12Resultingfail-safefault-tolerantprogramwithperfectdetec-
tion,andminimaldetectionlatencytoF1............111
6.13Programp1inpresenceofF2...................111
6.14Resultingfail-safefault-tolerantprogramp2inpresenceofF2.112
6.15Algorithmadd-perfect-fail-safe-multitolerance-alladdsfail-
safefaulttolerancetonfaultclasses,withperfectdetection
toeveryfaultclassbyconsideringallfaultclassesatthesame
time.................................114
6.16Fault-intolerantprograminpresenceofF1...........116
6.17Fault-intolerantprograminpresenceofF2...........116
6.18Resultingfail-safemultitolerantprogramp2toF1andF2with
perfectdetectiontobothfaultclasses..............117
6.19Algorithmadd-efficient-fail-safe-multitolerance-alladdsfail-
safefaulttolerancetonfaultclasses,withperfectdetection,
andminimaldetectionlatencytoeveryfaultclassbyconsid-
eringallfaultclassesatthesametime..............118
6.20Fault-intolerantprograminpresenceofF1...........119
6.21Fault-intolerantprograminpresenceofF2...........119

xvi

26.2

Resulting

erfepct

classes

safeil-fa

tleranultitom

ramogpr

detectionandminimaldetection

when

ideringscon

lla

ultfa

xvii

sesscla

p2

to

F1

and

F2

with

latencytobothfault

ta

the

mesa

time.

.

.

201

xviii

1pterCha

Introduction

Thedesignofreliablecomputershasbeenachallengeeversincecomputers
firstappearedinthemiddleofthe20thcentury.Inthosedays,computers
werebuiltoutofunreliablecomponents,suchasvacuumtubes,relays,andso
on.Latergenerationsofcomputersweremorereliableastheywerebuiltfrom
morereliablecomponents,suchassemiconductorcomponents,andother
componentsfrommoreadvancedtechnology.Computerswereexpensive,
andwereusedmainlyforcomputation-extensivetasks,research,anddefense.
Nowadays,withtheever-increasingcircuitdensity,computersarenolonger
expensivecommodities.Infact,theyarebecomingmoreandmorepervasive.
Theyarebeingusedineverywalksoflife,fromsafety-criticalsystems,suchas
nuclearplantscontrol,airplanesetc,toconsumer-orientedproducts,suchas
automobiles,refrigeratorsetc.Asthesecomputersystemspervadeourlives,
ourexpectationontheirdeliveryofservices,inspiteoffaults,increases.We
needthesecomputersystemstobedependable.
Inthischapter,first,wewillfirstbrieflysurveythefundamentalsofde-
pendability(Section1.1),whereweprovideanoverviewofthemainsteps
involvedinthedesignoffault-tolerantsystems.Wethenexplainthemoti-
vationsbehindtheworkpresentedinthisthesis(Section1.2).Wewillthen
presenttheproblemstatements,andpertinentresearchquestionsthatarise
andexplainourresearchcontributions.

1

1.1Dependability:BasicConcepts

Inthissection,weexplainhowdependable(fault-tolerant)programsare
designedingeneral.First,weexplainthefault/error/failureclassification,
andthenweexplainhowdependabilitycanbeachieved.Givenourfocus
onfaulttolerance,wethenbrieflysurveythemainstepsinachievingfault
tolerance.Lastly,weexplainhowtheresultingsystemcanbevalidated.
Thetermdependabilityisdefinedas“thetrustworthinessofasystemsuch
thatreliancecanjustifiablybeplacedontheserviceitprovides”[Lap92].
Thismeansthattheservicesprovidedbysuchasystemarealwayscorrect,
accordingtothesystem’sspecification,whethertheenvironmentinwhichit
isdeployedisideal,orlessthanideal(faulty).

1.1.1Faults,Errors,andFailures:
Duringtheconstructionoroperationofacomputersystem,eventsmayoccur
thatcanthreatenthecomputersystem’sabilitytodelivercorrectservices.
Forexample,developersofthesystemmayhaveinadvertedlyintroducedde-
fects(orbugs)duringtheconstructionphase.Anotherfactorthatcanaffect
acomputersystem’sabilitytodelivercorrectservicesistheageingofcom-
ponents,thoughitsrelevancemaybelessinsoftware.Anotherexampleof
aneventthatcanjeopardizethecomputersystem’soperationsisitsdeploy-
mentinnoisyenvironmentsthatgenerateunexpectedevents.Factorsthat
canaffecttheproperfunctioningofacomputersystem,suchasnoise,bugs
etc,arecommonlyreferredtoasfaults.
Anerrorissaidtoexistinacomputersystemwhenacorresponding
faultisactivated.Specifically,afaultinitselfmaynotthreatentheproper
functioningofthesystem,forexample,ifafaultoccursinanareaofmemory
thatisnotaccessed,thenthefaulthasnoabilitytoinfluenceanycomputa-
tion.However,whenafaultisactivated,forexampleacomputationreaches
thefault-affectedareainmemory,andthefaultyvalueusedduringthecom-
putation,ifnocorrectiveactionistaken,thereistheriskofthecomputer
systemtoviolateitsspecification,i.e.,donotdelivertherequiredservice.
Whenafaultyvalueisusedinsomecomputation,errorissaidtopropagate,

2

i.e.,thereiserrorpropagation.Whentheerrorpropagatestothe“output”
ofthecomputersystem,afailureissaidtohappen,i.e.,thebehaviourofthe
systemhasdeviatedfromwhatisprescribedbyitsspecification.
Thus,tobeabletodevelopafault-tolerantsystem,oneneedstoun-
derstandthefaultsthatcanpotentiallyaffectthesystem,i.e.,oneneedsto
developafaultmodel.

1.1.2WaysofAchievingDependability
Onceafaultmodelhasbeendeveloped,therearevariouswaysofdealingwith
it,i.e.,therearedifferentwaysofachievingdependability,whendesigninga
dependablesystem,namely:

•FaultPrevention:Asthenamesuggests,thisapproachtriestopre-
ventfaultsfromoccurringinthefirstplace.Examplesoffaultpreven-
tionapproachesareuseofsounddevelopmentmethodologiesoruseof
radiation-hardenedhardware.
•FaultTolerance:Thisistheabilityofasystemtodeliverdesired
leveloffunctionalityinthepresenceoffaults,i.e.,insteadofpreventing
faultsfromoccurring,onetriestotoleratetheireffects.Toachievethis,
thesystemshouldbeabletodetectand/orcorrecterrorsinthesystem.
•FaultRemoval:Thisprocessdealswithremovaloffaults,andis
commonlyreferredtoasdebugging(forsoftware).
•FaultForecasting:Thisprocesshelpsinevaluatingtheconsequences
offaultswhentheyoccur.

1.1.3AttributesofDependability
Onceadependablesystemhasbeendesigned,oneneedstomeasureits“de-
pendability”.Therearedifferentattributesthatcharacterizedependability,
:pleexamfor

•Reliability–Thisattributedefinestheprobabilityofasystemtopro-
videcorrectserviceoverafiniteperiodoftime.

3

•Availability–Thisattributedefinestheprobabilityofasystemtobe
correctatanygiventime.

•Safety–Thisattributecapturestheextenttowhichaserviceprovided
byasystemissafe.

Otherattributessuchasconfidentialityandintegrityarealsoattributesof
dependability,butaremorerelatedtosecurityissues,andwedonotdiscuss
further.yanthem

1.1.4DesignofFaultTolerance
Asweexplainedearlier(Section1.1.2),therearevariouswaysofachieving
dependability.Inthisthesis,wefocusmainlyonfaulttolerance.Faulttol-
eranceistheabilityofasystemtoprovideadesiredleveloffunctionality
inpresenceoffaults.Faulttoleranceiscloselycoupledtothefaultmodel
assumed,i.e.,afault-tolerantsystemmaybeabletotolerateoneclassof
faults,andstillnotabletotolerateanothertypesoffaults.
Forasystem(program)tobefault-tolerant,itneedstobeabletoperform
someimportantstepswhenevererrors(effectsoffaults)appear.Ingeneral,
provisionoffaulttolerancecanbedividedintofourstages[LA90]:

1.ErrorDetection:Thisstepisconcernedwiththeabilityofthesystem
todetectthatsomeerroneousstatehasbeenreached,andthatthe
systemisinsome“unsafe”state.Errordetectionisimportant,since
thesystemisthenpreventedfromperformingunsafeactions.

2.DamageAssessment:Afteranerrorhasbeendetected,oneneedsto
determinetheextenttowhichdamagehasbeencausedtothesystem.
Inparticular,oneneedstodeterminetheextenttowhicherrorhas
propagatedthroughthesystem.

3.ErrorProcessing:Oncedamageassessmentisdone,errorprocessing
isinitiatedthattriestorevertthesystembacktoanon-erroneousstate,
i.e.,asafestate.Thecombinedactionsofdamageassessment,anderror
processingiscommonlyknownaserrorrecovery.

4

4.FaultTreatment:Thisstepisconcernedwithpreventingthesame
faultsfromgettingactivatedagain,andisgenerallyperformedoffline.

Overall,afault-tolerantprogramshouldbeabletofirstdetecterrors,and
thentorecoverfromthem.Todesignfaulttolerance,AroraandKulkarni
observedin[AK98c,AK98a,AK95,Kul99]thattwocomponents,whichthey
termedasdetectors,andcorrectors,underpinthedesignoffaulttolerance.
Adetectorisaprogramcomponentthatisaddedtoaprogramtodetect
errorsintheprogram.Examplesofdetectorsareexecutableassertions[Sai78,
MAM84,Hil00],errordetectingcodes,snapshotprocedures,comparators
andsoon.Acorrector,ontheotherhand,isaprogramcomponentthatis
addedtorecoverfromerrors.AroraandKulkarnihaveshownthat,byusing
eitherdetectors,correctorsorbothofthem,differentclassesoffault-tolerant
programscanbeobtained,namelyfail-safefault-tolerantprograms,non-
maskingfault-tolerantprograms,andmaskingfault-tolerantprograms.Each
classoffault-tolerantprogramsprovidesaspecifiedleveloffaulttolerance.
Inthisthesis,wefocusonthedesignoffail-safefault-tolerantprograms.
Itwasshownin[AK98c]that,tomakeaprogramfail-safefault-tolerant,it
isbothnecessaryandsufficienttoadddetectorstothatprogram.Inthis
thesis,theapproachwewillpresentallowsaprogramtohavebothperfect
errordetectionandminimaldetectionlatency.Thisinturnhastheeffectof
constrainingerrorpropagation,hencelimitingtheamountofdamagedone
inthesystem.Thus,bydesign,thedamagedoneinpresenceoffaultsis
minimal.Theimplicationofthisisthattheerrorprocessingphaseneedsnot
beverycomplicated(sophisticated).

1.1.5VerificationandValidationofFaultTolerance
Inthedesignoffault-tolerantsystems,oneneedstoverifythecorrectness
ofthesystem.Todothis,formalmethods[CW96]hasoftenbeenused.
Thefirststepistospecifythepropertiesthatthesystemshouldhave.The
specificationisusuallydoneinsomelogicalformalism,usuallytemporallogic,
whichcanasserthowthebehaviorofthesystemevolvesovertime.The
secondstepistoconstructaformalmodelforthesystem.Inordertobe

5

suitableforverification,themodelshouldcapturethosepropertiesthatmust
beconsideredtoestablishcorrectness.Duringtheverificationprocess,the
propertiesthatestablishcorrectnessareverified.Inthedependabilityarea,
formalmethodshavebeenusedtoverifycorrectnessofdistributedand/or
real-timeprotocols[KRS99,SS99b].Ithasalsobeenobservedthataproper
decompositionofafault-tolerantprogramintoitscomponentshelpsininits
mechanicalverification[KRS99].
Oncethesystemhasbeenimplementedandfaulttolerancemechanisms,
suchasdetectorsandcorrectors,havebeenadded,theresulting“fault-
tolerant”systemneedstobevalidated.Twocommonlyusedmethodsfor
validationaretesting,andfaultinjection.Intesting,thesystemissubjected
toanumberoftestcasestoascertainthattherearenobugs(faults)inthe
system.Bugsaresuspectedpresentwhenthesystemdeviatesfromitsspec-
ifiedbehaviorunderanytestcase.Theproblemisusuallytofindsuitable
testcaseswhichcanuncoverthosebugs.In[SS99b],theauthorsadopta
formal-basedapproachwherebyverificationinformationisreusedtodrive
tion.generaseatest-cTovalidatethefaulttolerancemechanisms,faultinjection[AAA+90,
IT96]isoftenused.Infaultinjectionexperiments,faultsareartificially
injectedinthesystemtocreateconditionsthatwillactivatethosefaulttol-
erancemechanisms.Faultinjectionsuffersfromthesameproblemastesting
forhavingtofindsuitabletestcases,aswellasdeterminingwhichtypesof
inject.tofaults

1.2MotivationandResearchQuestions

Onthisbackground,inthissection,wewilldiscussthemotivationsthat
underpintheworkpresentedinthisthesis.Ouroverallgoalistodevelop
aframeworkthatallowssystematicdevelopmentofefficientfail-safefault-
tolerantprograms.
Themotivationbehindtheworkpresentedinthisthesisismultifold.
First,itiswell-knownthatthedesignoffault-tolerantsystemsisinherently
complex.Thus,thereisaneedforwell-definedandsounddevelopment

6

methodologiesthatcanguidethesoftwaredesignerinthedesignofefficient
andcomplexdependablesystems.
Also,itisoftenthecasethatadditionoffaulttolerancemechanisms(i.e.,
detectorsandcorrectors)interferewiththeperformanceofthesystem.For
example,someerrordetectionmechanismsmaybeaddedthattriggeralotof
falsealarmsinthesystem.Thishastheeffectofaffectingtheperformanceof
thesystem.Moreimportantly,ithasalsobeennotedthatdesignofefficient
faulttolerancemechanismsisveryoftenreliantontheexperienceofthe
programmers.Thisagainpointstoaneedforsoundmethodologiesthatcan
guidetheprogrammersinthedesignofefficientfaulttolerancemechanisms.
Further,inthestartphaseofthedesign,thesoftwaredesignermaynot
befullyawareofallthefaultclassesthatthesystemwillbesubjectedto.
Asthesystemevolves,andthesystemdesignerbecomesmoreawareofmore
faultclasses,additionalfaulttolerancemechanismsmayneedtobeadded
tohandlethesefaults.However,eachtimefaulttolerancemechanismsare
added,acompleteverificationofthenewprogramisneeded,whichisexpen-
sive.Also,non-interferenceacrossthedifferentfaulttolerancemechanisms
needtobeascertained.Thus,theabilityto“add”newtolerancemechanisms
withouthavingtoperformacompleteverificationoftheprogramiscrucial.
Overall,weendeavortodevelopaframeworkthat(i)enablesthedesign
ofefficientfaulttolerancemechanisms(morespecifically,detectors),and(ii)
enablescompositionaldesignoffaulttolerance.Combinedtogether,wepro-
videaframeworkthatenablessystematic(compositional)designofefficient
fault-tolerantprograms.
Focus:Inthisthesis,wefocusonthedesignofaparticularclassof
faulttolerance,namelyfail-safefaulttolerance.Informally,aprogramisfail-
safefault-tolerantifitalwaysremainsinasafestate,eveninthepresenceof
faults(Wewillformallydefinethetermfail-safefaulttoleranceinChapter3).
Thereasonforfocusingonfail-safefaulttoleranceismultifold.First,fail-
safefaulttoleranceisoftenneededincriticalapplications,suchasnuclear
plants,traincontrolsystemsandsoon.Veryoften,detectionistheonly
objective,andonceanerrorisdetected,amechanicalbackupsystemtakes
over.Second,itwasshownbyAroraandKulkarniin[AK98b]thattodesign

7

maskingfaulttolerance(whichistheidealfaulttolerance),onecanfirst
designaprogramtobefail-safefault-tolerantandthenlaterextendedwith
correctorstomaketheprogrammaskingfault-tolerant.Thus,ourapproach
tacklesonestepinthedesignofmaskingfaulttolerance.
Givenourfocusonthedesignoffail-safefaulttolerance,itwasshownby
AroraandKulkarni[AK98c]thatitisbothnecessaryandsufficienttocom-
poseaprogramwithdetectorstomakeitfail-safefault-tolerant.Therefore,
whendesigningsuchfail-safefault-tolerantprograms,wealsofocusonthe
designofdetectors,i.e.,programcomponentsthatdetecterrors.
Onthisbackground,weformulatetheproblemstatementsthathave
driventheresearchpresentedinthisthesis.

1.2.1ProblemStatements
Themaingoaloftheworkpresentedinthisthesishasbeentodevelopa
frameworkthatcanhelpinthedesignofefficientfail-safefault-tolerantpro-
.amsgrWhileaddressingtheaboveproblem,wetackledsomeofthefollowing
questions:hresearcResearchQuestion:Howcanoneassesstheefficiencyofafail-safe
fault-tolerantprogram?Whatarethecommonmetricsforsuchanassess-
ment?Whendesigningfault-tolerantprograms,errordetectioniscrucial.Very
often,tovalidatethedetectors,faultinjectionexperimentsareperformedto
assesstheefficiencyofthedetectors,andcommonfactorsusedforsuchas-
sessmentare(i)detectioncoverage,and(ii)detectionlatency.Inthisthesis,
wethusfocusonthosetwopropertiesoffail-safefault-tolerantprograms,
namelycoverage,anddetectionlatency.
ResearchQuestion:Whatarethemainpropertiesofadetectorthat
allowcharacterizationofitsefficiency?Cansuchpropertiesbeformalized?
InChapter4,wedevelopatheoryofdetectors,andidentifycompleteness
andaccuracyastwoimportantpropertiesofadetectorthatcharacterize
itsefficiency.Wethenformalizethesepropertiesandidentifyanimportant
classofefficientdetectors,namelyperfectdetectors.Weexplainthatsuch

8

adetectorallowsforperfecterrordetection,andwefurtherexplainitsrole
infail-safefaulttolerance.Thus,perfectdetectorscanbeshowntohave
“perfect”coverage.
ResearchQuestion:Upontheoccurrenceoffaults,howcanerrorprop-
agationbelimited?.Howcanthedetectionlatencyofaprogrambemini-
mized?.Isitpossibletodesignafail-safefault-tolerantprogramsuchthatits
detectionlatencyisminimal?.Dothedetectorsincludedhaveanyimpacton
theunderlyingprogram?.
Totacklethisquestion,inChapter5,wedevelopatheoryoffastdetec-
tors,andexplainhowfail-safefault-tolerantprogramswithminimaldetec-
tionlatencybedesigned.Infact,theapproachweproposeallowsafail-safe
fault-tolerantprogramtohavebothperfectdetection,andminimaldetection
.ncyelatResearchQuestion:Canefficientdetectorsbedesignedforseveralfault
classes?Howcantheirnon-interferencebeguaranteed?Isthereanymethod-
ologythatcanbeusedsuchthatverificationneedsnotbeperformedfrom
scratcheachtimenewdetectorsfornewfaultclassesareadded?.
Themotivationbehindthisresearchissueisthat,duringperiodsofper-
turbation,asystemissubjectedtofaultsfromvarioussources,suchasnet-
workoverloads,messagelosses,transients,crashesandsoon.Itisvery
difficulttodesigna(fail-safe)fault-tolerantprogramtothesefaultclasses.
So,theideaistoconsideronefaultclassatatime,anddesignthefault
tolerancemechanismstothefaultclassconsidered.Theobviousproblem
iswhetherthefaulttolerancemechanismsfordifferentfaultclassescanbe
composed,i.e.,oneneedstoascertainthatfaulttolerancemechanisms(de-
tectors)toagivenfaultclassdonotinterferewiththoseofanotherfault
class.Further,theproblemisalsotodevelopefficientfaulttolerancemecha-
nismstoseveralfaultclasses.Specifically,itwouldbedetrimentalifefficient
faulttolerancemechanismsaredesignedtotolerateonefaultclass,butthe
tolerancemechanismsforanotherfaultclassisnotefficient,resultinginan
inefficientmultitolerantsystem/program.Thus,ourtheorynotonlyshows
non-interferenceacrossdifferentdetectors(intermsoftheirbehavior)for
differentfaultclasses,butourtheoryalsoshowsthat“composing”differ-

9

entperfectdetectorsfordifferentfaultclassespreservetheefficiencyofthe
m.graproresultingThus,inChapter6,wedevelopatheoryforthedesignofefficientmulti-
tolerantprograms.Buildinguponthetheoryofperfectdetectors(Chapter4)
andthetheoryoffastdetectors(Chapter5,wedevelopatheoryforthede-
signofefficientmultitolerantprograms,anddeveloptherequisitestepsto
showascertainnon-interferenceacrossdifferentdetectors.
ResearchQuestion:Canthedesignofefficientfail-safefault-tolerant
programsbeautomated?
Totacklethisquestion,wehavedevelopedalgorithmsofpolynomial-time
complexitythatautomaticallysynthesizesafail-safefault-tolerantprogram,
startingfromacorrespondingfault-intolerantprogram.Wehavedeveloped
examplesshowinghowthesealgorithmscanbeusedforsuchautomaticsyn-
thesis.ResearchQuestion:Canwereusethefault-intolerantprogramtosyn-
thesizethefault-tolerantprogram?
Therearetwopossiblewaysofsynthesizingafail-safefault-tolerantpro-
gram.First,onecanstartwithaspecificationoftheprogram,andthen
userefinementstepstofirstsynthesizeafault-intolerantprogram,andthen
performafaulttolerancetransformationtoobtainafail-safefault-tolerant
program.Thesecondoptionistostartdirectlywiththefault-intolerantpro-
gramandthentransformitintoafail-safefault-tolerantprogrambycompos-
ingitwithfaulttolerancemechanisms.Inthisthesis,weadoptthesecond
methodology,and,startingfromafault-intolerantprogram,wetransformit
intoafail-safefault-tolerantprogrambycomposingitwithfaulttolerance
components(morespecifically,withdetectors).

1.3ResearchContributions

Towardsaddressingalltheseresearchissues,wehavedevelopedatheoryof
detectors,andidentifiedandformalizedsomeimportantpropertiesofthese
programcomponents.Wehaveidentifiedaclassofdetectorscalledper-
fectdetectorsthatallowsdesignofefficientfail-safefault-tolerantprogram.

10

Specifically,wewillshowhowtodesignfail-safefault-tolerantprogramswith
perfecterrordetection,andminimalerrordetectionlatency.Basedonthe
theory,wealsodeveloppolynomial-timealgorithmsthatpermitautomatic
synthesisofefficientfail-safefault-tolerantprograms.
Further,wedevelopatheorythatunderpinsthedesignofmultitoler-
antprograms.Weshowthattheclassofperfectdetectorsallowsfor“non-
interferingcomposition”,i.e.,perfectdetectorsforeachfaultclassdonot
interferewitheachother.Specifically,perfectdetectorsfordifferentfault
classesdonotinterferewitheachother’s“behavior”,andtheydonotinter-
ferewiththeefficiencyoftheprogram.
Overall,inthisthesis,wemakethefollowingresearchcontributions:

1.Wefirstpresentanoveltheoryofdetectors,formalizesomeimportant
propertiesofdetectors,andidentifyanimportantclassofdetectors,
namelyperfectdetectors,thatunderpinsdesignofefficientfail-safe
fault-tolerantprograms.Wefurtherexplaintheirroleinthedesignof
fail-safefaulttolerance.Wealsoprovideanalgorithmthatautomati-
callyyieldsfail-safefault-tolerantprograms,withperfectdetection.
2.Next,wepresentanoveltheoryoffasterrordetection,andbuilding
uponthetheoryofperfectdetection,wedevelopanalgorithmthatgen-
eratesfail-safefault-tolerantprogramswithbothperfecterrordetection
andwithminimalerrordetectionlatency.
3.Weexplainthat,inthecontextofmultitolerancedesign,somenon-
interferenceconditionsneedtobeverified.Wefurtherexplainthat
non-interferenceacrossdetectorswithrespecttotheirbehaviorisnot
sufficientwhendesigningefficientmultitolerantprograms.Wethere-
forepresentasetofnon-interferenceconditionsthatencompassboth
behavioralandperformanceaspects.Assuch,wedevelopasuiteof
algorithmsthatsystematicallyadds“efficient”fail-safemultitolerance
toaprogram.Overall,thiscontributionallowscompositionaldesign
ofefficientfail-safefault-tolerantprograms,i.e.,efficientfail-safefault-
tolerantcanbesystematicallydesigned.

11

Ourcontributionsareintheareaoffaulttolerance,specificallyinthe
fieldoferrordetection.Wehaveshownhowtodesignefficientdetectorssuch
thatthefail-safefault-tolerantprogramshaveperfecterrordetectionand
minimaldetectionlatencyfordifferentfaultclasses.
Tosummarize,ourmaincontributionisanapproachthattransformsa
fault-intolerantprogramintoafail-safefault-tolerantprogramwithperfect
errordetection,andminimaldetectionlatency,i.e.,efficientfail-safefault-
tolerantprogram.

1.4ThesisStructure

Thethesisisstructuredasfollows:
Chapter2surveysresultsintheareasdesignoffaulttolerance,auto-
mateddesign,programtransformation,andmultitolerance.Wealsotryto
putourcontributionsintocontext.
Chapter3introducestheformalfoundationsforourworkandpresents
theterminologiesusedinthisthesis.Wealsopresentthesystemmodeland
d.usedelmofaultChapter4introducesatheoryofperfectdetectors,anddevelopsasound
andcompletealgorithmthatyieldsfail-safefault-tolerantprogramswithper-
n.detectiotfecChapter5introducesatheoryoffastdetectors,anddevelopsasound
andcompletealgorithmthatyieldsfail-safefault-tolerantprogramswithper-
fectdetection,andminimaldetectionlatency.
Chapter6explainstheconceptofmultitolerance.Itdevelopsase-
riesofnon-interferenceconditionsthatneedtobesatisfiedwhendesigning
multitolerance.Severalalgorithmsaredevelopedthatyieldfail-safemulti-
tolerantprogramswithvariedoptimalproperties,aswellasguaranteeing
erence.fternon-inChapter7summarizesthecontributionsofthisthesis,andassessestheir
impact.Weconcludebyprovidingsomepointersregardingfuturework.

12

pterCha

2

RelatedWork

Inthischapter,wepresentasurveyofpreviousworkandresultsthatare

closelyrelatedtotheproblemsaddressedinthisthesis.Specifically,the

areasofmostcloselyrelatedaredesignoffault-tolerantprograms,design

ofeffectivedetectors,automatedprocedures,errorpropagationanalysis,and

softwareimplementedfaulttolerance.

13

2.1DesignofFault-TolerantPrograms

Onecommonwaytoimplementfault-tolerantprogramsistouseN-Version
programing[Avi85],whichishoweveranexpensiveapproach.Anotherap-
proachhasbeentouseRecoveryBlocks[Ran75].Buttheeffectivenessof
recoveryblocksisheavilyreliantontheeffectivenessoftheacceptancetests
included.Unfortunately,littleworkhasbeendonethatcanguideasoftware
designertowardsdesigningeffectiveacceptancetests(detectors).
Levesonet.alpresentedtheresultsofalargescaleexperimenttodeter-
minetheeffectivenessofsoftwarechecksandvotinginsoftwarein[LCKS90].
Theyexplainedthattheeffectivenessofdetectorsdependsverymuchonthe
individualabilityoftheprogrammerstodesigneffectivedetectors.Again,
asinthecaseofRecoveryBlocks,littleworkhasbeendonetoguidethe
programmersindesigningeffectivedetectors.However,toeasetheuseof
executableassertions(whichisaninstanceofadetector),Saibextendedthe
FORTRANandPASCALlanguageswithasoftwareconstruct(calledAssert)
thathelpsintheimplementationofexecutableassertions[Sai78].Another
approachforfacilitatingtheuseofassertionsistheuseoftheAnnotation
PreProcessortoolofRosenblum[Ros95].Asimilarapproachisdescribed
byYinandBieman[YB94].Theproblemwiththeseapproachesistheydo
notprovideguidelinespertainingtothedesignofeffecivedetectors,whichis
difficult,sinceveryoften,theseassertionstendtobeapplication-specific.In
thiswork,weprovidealgorithmsthatcanautomaticallygenerateperfectde-
tectors,hencetheproblemofdesigningapplication-specificdetectorscanbe
effectivelytakenawayfromprogrammers,oncethefault-intolerantprogram
ailable.vaisToconquerthedesigncomplexity,AroraandKulkarniproposedatrans-
formationalapproachwherebyafault-intolerantprogram(aprogramwhich
satisfiesitsspecificationintheabsenceoffaults),andthatsatisfiesatleast
itssafetyspecificationinpresenceoffaults)istransformedintoafault-

14

tolerantprogram(eitherfail-safe,non-maskingormasking)throughthead-
ditionofdetectorsand/orcorrectors.Usingthisapproach,theyhavepre-
sentedfault-tolerantsolutionsforseveralproblemssuchasdistributedre-
set[KA98],mutualexclusion[AK98b],networkmanagement[KA97a],data
transfer[AK98b],andByzantineagreement[KA97b].
Thepremiseisthatafault-tolerantprogramisacompositionofafault-
intolerantprogramwithfaulttolerancecomponents,suchasdetectorsand
correctors.Theauthorsarguethatsuchanapproachallowsforseparation
ofconcerns.Specifically,itispossibleforasoftwaredesignertofirstfocus
ondesigningthefault-intolerantprogram,andthenfocusonaddingfault
tolerancetoit.
In[AK98a],AroraandKulkarnipresentedanstepwiseapproachforaddi-
tionofmultitolerance,i.e.,theabilityofbeingfault-toleranttomultiple
classesoffaults.Theyalsoarguedthatnon-interferencebetweendiffer-
entprogramcomponentsneedstobeverified,andpresentedasetofnon-
interferenceconditionsforthatmatter.Inthispresentwork,weextendthe
currentsettoincludenon-interferencewithotherprogramproperties(apart
fromfaulttolerance),suchasperfectdetection,andminimaldetectionla-
.tencyAnothertransformationalappraochhasbeenproposedbyJosephand
Liu[Liu91,LJ92,LJ93,LJ94,LJ95].Theyshowhowaprogramconstructed
forafault-freesystemcanbetransformedintoafault-tolerantprogramfor
executioninfaultyenvironments.Specifically,theadditionoffaulttolerance
toaprogramismodeledbyafault-toleranttransformationthataddsthe
necessaryredundancytotheprogramsothatthefaultscanbetolerated.
Afault-tolerantprogramcanbefurtherrefinedusingfault-tolerantrefine-
mentthatpreservesboththefunctional,andfault-tolerantpropertiesofthe
m.graproThefaulttolerancemechanismsusedareverymuchdependentonthe
faultmodelused.Forexample,indatatransfer,timeoutsmaybeused
todetectmessagelosses,ratherthan,say,executableassertions.How-
ever,theproblemofknowinginadvancealltheclassesoffaultsthesoft-
warecanbesubjectedtomaybedifficulttosolve.Forexample,contin-

15

uingwiththeexampleondatatransder,ifthesystemdesignerassumes
onlythecasewheremessagescanbelostduringdatatransfer,hecanhave
animplementationsuchthatthesendingnodecanretrysendingtheloss
messageafteratimeout.Butifafaultoccursthatarbitrarilycorrupt
thestateoftheprogram,suchretryactionsmaynotbesufficient,and
safetymaybecompromised.Hence,weakfaultmodelsaresometimesas-
sumed,suchasByzantinefaults.Insuchcases,selfstabilization[Dij74]
hasbeenadvocated,andisgettingmoreandmoreattentioninthecommu-
nity.Forexample,GoudaandMultariproposedsomeself-stabilizingcom-
municationprotocolsin[GM91].Self-stabilizingprotocolshavebeenpro-
posedin[APSV91,DIM93,DW95,AD97,Dol97,BDDT98,Dol00]among
others.However,theproblemwithself-stabilizationisthatsafetymaybe
temporarilyviolated.Oneinterestingclassofself-stabilization,calledsnap-
stabilization,hasbeenproposedbyCournieret.al[CDPV01]thatsolvesthis
problem.Asnap-stabilizingprotocolisaself-stabilizingprotocolmeaning
thatstartingfromanarbitrarystate(inresponsetoanarbitraryperturba-
tionmodifyingthememorystate)itisguaranteedtobehaveaccordingtoits
ion.cificatespAnotherlineofapproachfordesignoffaulttolerance,wherethegoalis
for“scalablefaulttolerance”,hasbeeninvestigatedbyAroraet.al[ADK01].
Toachieveselfstabilization,oneneedstomakeuseofsystemimplementa-
tion.However,theauthorsarguethatthisapproachdoesnotscaleverywell.
Hence,theyproposetoimplementstabilizationbasedonsystemspecifica-
tion,suchthatthestabilizationpropertyisguaranteedirrespectiveofthe
tion.taimplemenTheeffectivenessofdetectorsisalsoaffectedbytheirplacementinthe
software,asindicatedbyHilleret.alin[HJS01],andtheauthorsalsodemon-
stratethesensitivityofthelocationsettotheunderlyingfaultmodel
in[HJS02].Oncethefault-tolerantsoftwareisobtained,fault-injection
experimentsareconductedtoevaluatetheresultingdependabilityofthe
program[IT96].However,suchworkdonotrevealtheweakspotsinthe
software,forexample,howerrorspropagateinthesoftware,whatarethe
vulnerablesignals/variables.Initialworkfocusingontheseaspectsap-

16

pearin[HJS01,JHS01].Toconductthesevalidationexperiments,effec-
tivetestcasesareneeded.SinhaandSuriinvestigatedtheapplicabilityof
formalmethodsindrivinggenerationoftestcasesin[SS98,SS99a].Specif-
ically,theauthorsreusedverificationinformationtodrivetestcasegener-
ation.In[JHS02b],Jhumkaet.alproposedaformalapproachfordesign-
ingcomponent-baseddependablesoftwareandin[JHS02a],theauthorspre-
sentedaformalapproachfortestcasegeneration,wherebytheyreusedetec-
tordesigninformationtodrivetestcasegeneration.
Generalsurveysintheareaofdependabilitycanbefoundin[Cri91,
Gae99a],whileGaertnerpresentedasurveyoftransformationalapproaches
e99b]a[Gin

2.2AutomatedProcedures

Inanearlierwork,KulkarniandArora[KA00]presentedanalgorithm
thatautomatestheadditionoffail-safefaulttolerancetoaninitiallyfault-
intolerantprogram.Thisalgorithmisbasedonananalysisofthestate
transitionrepresentationoftheprograminthepresenceoffaults.Theal-
gorithmissoundandcompletemeaningthat(i)thetransformedprogram
isinfactafail-safefault-tolerantversionoftheoriginalprogram,and(ii)
ifafail-safefault-tolerantversionoftheprogramexists,thenthealgorithm
willfindit.Thecomplexityofthealgorithmispolynomialinthestateof
thefault-intolerantprogram.Putincontextwiththeworkpresentedinthis
thesis,thealgorithmin[KA00]alwaysaddsfail-safefaulttolerancewith
perfectdetection.But,thealgorithmcansometimesaddfail-safefaulttol-
erancewithperfectdetection,andminimaldetectionlatencytosomeclasses
ofprograms.Bywayofcontrast,ourworkpresentsalgorithmsthatalways
addbothperfectdetection,andminimaldetectionlatencytoawidersetof
programs.Thealgorithmshavepolynomialcomplexityinthestatespaceof
thefault-intolerantprogram.

17

18

3pterCha

FormalPreliminaries

Inthischapter,werecallthestandardformaldefinitionsofprograms,faults,
faulttolerance(inparticular,fail-safefault-tolerance),andofspecifica-
tions[AK98c,Kul99].Intuitively,aprogramisrepresentedasatransition
system,sinceprogramswritteninanyimperativelanguagecanberepre-
sentedassuch.Thischapterprovidesalltherequisiteformalbasisupon
whichtheworkpresentedinthisthesisisbased.

3.1ConcurrentSystems

Aconcurrentsystemconsistsofasetofcomponentsexecutingtogether.
Theyareusuallyassociatedwithaformofcommunicationamongthem.
Themodeofexecution,andthatofcommunicationmaydifferfromsystem
tosystem.Therearetwomainmodesofexecution:
1.Asynchronousorinterleavedexecution,whereonlyonecomponent
makesastepatanytime.

2.Synchronousexecution,whereallcomponentsmakeastepatthesame
time.Asforthecommunicationpart,wepresenttwoofthepossibilities,namely
1.SharedVariables(weprovidemoredetailsinSection3.2).

2.MessagePassing,wherecomponentscommunicatewitheachotherby
.essagmessending

19

Theworkassumesaninterleavedsemanticsofexecution,togetherwith
thesharedvariablecommunicationparadigm.

3.2Programs
Definition1(Program)AprogrampconsistsofasetofvariablesVpand
afinitesetofprocesses.Eachprocesscontainsafinitesetofactions,andafi-
nitesetofvariables.Eachvariablestoresavaluefromapredefinednonempty,
finitedomainandisassociatedwithapredefinedsetofinitialvalues.Anac-
formthehastion

name::guard→statement

inwhichtheguardisabooleanexpressionovertheprogramvariablesandthe
statementiseithertheemptystatementoraninstantaneousassignmentto
oneormorevariables.Thenameisauniqueidentifierofthataction.

Definition2(StateandStateSpace)Wedefineastatesofprogramp
asapossiblevalueassignmenttoallvariablesinp.Wealsodefinethestate
spaceSpofaprogrampasthesetofallpossibleassignmentsofvaluesto
variables.

Definition3(StatePredicate)Astatepredicateofpisabooleanexpres-
sionoverthestatespaceofp.

Definition4(InitialStates)ThesetofinitialstatesIpisdefinedbythe
setofallpossibleassignmentsofinitialvaluestovariables.

Definition5(Enabled)Anactionacofpisenabledinastatesifthe
guardofacevaluatesto“true”ins.

Definition6(Action)Anactionacofprogrampisrepresentedbyaset
ofstatepairs{(s,t):s,t∈Sp}.

Weassumethatactionsaredeterministic,i.e.,∀s,s,s:(s,s)∈
ac∧(s,s)∈ac⇒s=s.Notethatprogramsarepermittedtobe

20

non-deterministicsincemultipleactionscanbeenabledinthesamestate.
Inparticular,eachnon-deterministicactioncanbeconvertedintoasetof
deterministicactionswithanidenticalstatetransitionrelation.

Definition7(ProgramComputation)Acomputationofprogrampis
aweaklyfair(finiteorinfinite)sequenceofstatess0,s1,...suchthats0∈Ip
andforeachj≥0,sj+1resultsfromsjbyexecutingtheassignmentofa
singleactionwhichisenabledinsj.

Werequirethatweakfairnessimpliesthatifaprogramactionacis
continuouslyenabled,aciseventuallychosentobeexecuted.Weakfairness
impliesthatacomputationismaximalwithrespecttoprogramactions,i.e.,
ifthecomputationisfinite,thennoprogramactionisenabledinthefinal
.estat

Definition8(Concatenation)Ifαisafinitecomputationandβisacom-
putationofp,wedenotewithα∙βtheconcatenationofbothcomputations.

Definition9(Occurs)Astatesoccursinacomputations0,s1,...ofpro-
grampiffthereexistsanisuchthats=si.Similarly,atransition(s,s)
occursinacomputations0,s1,...ofprogrampiffthereexistsanisuchthat
s=siands=si+1.

Inthecontextofthisthesis,programsareequivalentlyrepresentedas
statemachines,i.e.,aprogrampisatuplep=(Sp,Ip,δp)whereSpisthe
statespaceofp,Ip⊆Spisthesetofinitialstates.Thestatetransition
relationδp⊆Sp×Spisdefinedbythesetofactionsasfollows:Everyaction
acimplicitlydefinesasetoftransitionswhichisaddedtoδp.Atransition
(s,s)∈δpiffacisenabledinstatesandcomputationofthestatementac
resultsinstates.Wesaythatacinducesthesetransitions.Statesiscalled
thestartstateandsiscalledtheendstateofthetransition.

Definition10(Step)Atransitionfromonestatetoanotherstateiscalled
.stepa

21

Definition11(StutteringStep)If(si,si+1)isastep,andsi=si+1,then
thisstepisastutteringstep.

Astutteringstepisanimportantconceptinprogramrefinement,inthe
sensethattransitionsatalowerlevelofabstractionappearasstuttering
stepsatahigherlevel.

Definition12(ComputationEquivalence)Twocomputationsα1and
α2aresaidtobeequivalentiftheycontainidenticalsequenceofstates.

Definition13(StutteringEquivalence)Twocomputationsα1andα2
areequivalentunderstutteringifα1andα2areequivalentafterremoving
stutteringstepsfrombothcomputations.

Definition14(Property)Apropertyisasetofcomputationswhichis
closedunderstuttering,i.e.,ifagivencomputationcisinpropertyP,then
allcomputationsthatarestuttering-equivalenttocareinP.

3.3Communication
Twoprocessesprandpwofaprogrampcommunicateasfollows:foreach
pairofprocessesprandpwthereexistsasetof“shared”variablesVs.Both
processescanreadthecontentsofanyvariableinVs,butonlypwcanupdate
thesevariables.Thisdefinestheinformationflowbetweentwoprocesses.
ThesetVsrepresentstheinterfacebetweenprocessespwandpr.
Thereexistsasetofspecialvariables,denotedbyVo,thatareshared
bysomeprocesses(thatwritetothevariables),andtheenvironmentthat
readsthem.Thesespecialvariablesarecommonlyreferredtoastheoutput
variables.Thereexistsalsoaspecialsetofvariables,denotedbyVi,where
eachofthevariablesiswrittentobytheenvironment,andreadbyaprocess
inp.Suchvariablesareknownasinputvariables.Inputandoutputvariables
representtheinterfaceoftheprogrampwithitsenvironment.
Suchprogrammodelreflectsthesystemassumptionsofdistributedem-
beddedapplications(likesensorsandactuators),forwhichpartofourformal
frameworkistargeted.Multipleinitialstatesreflectthefactthataprogram

22

pmayinitiallyreadexternalinputsbeforeexecuting.Insuchcases,wead-
ditionallyassumeasetofspecialvariablescalledtheoutputvariablesofpin
whichtheprogramshouldfinallywritetheresultsofacomputation.This
modelissuitableforthedomainofembeddedapplications(likesensorsand
actuators).Programactionscanbepartitionedintotwocategories:(i)criti-
calactions,and(ii)non-criticalactions[AK98b].Programactionsthatwrite
outputvariablesarecriticalactions.Otherexamplesofcriticalactionsare
(i)actionsthatcommittoadatabase,or(ii)actionsthatcontrolprogressin
anuclearcontrolplant.Criticalactionsarethoseactionswhoseexecutionin
thepresenceoffaultscancauseviolationofsafety.
Definition15(Criticalandnon-criticalactions)Anactionacofpro-
grampwithsafetyspecificationSSissaidtobecriticaliffthereexistsa
transition(s,t)inducedbyacand(s,t)isabadtransition(Proposition2)
thatisreachable(Definition32)inpresenceoffaults.Anactionisnon-
criticaliffitisnotcritical.

3.4Specifications
Aspecificationforaprogrampisasetofcomputationsofpthatisfusion-
closed.Definition16(FusionClosure)AspecificationSisfusion-closediffthe
followingholdsforfinitecomputationsα,γ,astatesandcomputationsβ,δ:
Ifα∙s∙βandγ∙s∙δareinS,thensoareα∙s∙δandγ∙s∙β.
Wewilldiscusstheconsequencesofdemandingfusion-closedspecifica-
tionsinSection4.2.1.
Definition17(Satisfies)AcomputationcpofpsatisfiesaspecificationS
iffcp∈S.
Definition18(Violates)AcomputationcpofpviolatesaspecificationS
iffcpdoesnotsatisfyS.

23

Definition19(Correctness)AprogrampsatisfiesaspecificationSiffall
possiblecomputationsofpsatisfyS.
Definition20(Maintains)Letpbeaprogram,Sbeaspecificationandα
beafinitecomputationofp.WesaythatαmaintainsSiffthereexistsa
sequenceofstatesβsuchthatα∙β∈S.
Definition21(Safetyspecification)AspecificationSofaprogrampis
asafetyspecificationiffthefollowingconditionholds:Foreverycomputation
σthatviolatesS,thereexistsaprefixαofσsuchthatforallstatesequences
β,α∙βviolatesS.
Usingapracticalsystemofrailcrossingwheretrainswillneedtosharea
commontrack,asafetyspecificationcanbe“notwotrainswillusethetrack
time”.sametheatProposition1AspecificationSisasafetyspecificationiffforallσ∈S
thereexistsaprefixαofσsuchthatαdoesnotmaintainS.
Proof.FollowsfromtheDefinitions20and21.
Informally,thesafetyspecificationofaprogramstatesthat“something
badneverhappens”.Formally,itdefinesasetof“bad”finitecomputation
prefixesthatshouldnotbefoundinanycomputation.AlpernandSchneider
[AS85]haveshownthateveryspecificationcanbewrittenastheintersection
ofasafetyspecificationandalivenessspecification.
Definition22(Liveness)Alivenessspecificationisasetofstatese-
quencesthatmeetsthefollowingcondition:foreachfinitestatesequence
α,thereexistsastatesequenceβsuchthatα∙βisinthatset.
Aexampleoflivenessspecification,followingfromourpreviousexample
ofrailcrossing,canbe“eventuallyalltrainswillbeabletousethetrack”.
Informally,alivenessspecificationdetermineswhattypesofeventsmust
eventuallyhappen,i.e.,itsaysthat“somethinggoodeventuallyhappens”.
Fortheworkpresentedinthisthesis,wewillfocusonsafetyspecification.
However,livenessissuesareimportantsinceanysafetyspecificationcanbe
24

satisfiedbytheemptyprogram,i.e.,theprogramthatdoesnothing,and,
thus,livenessspecificationhelpsruleouttrivialimplementations.
Ingeneral,ifapropertyisfinitelyrefutable,thenitisasafetyproperty.
Thismeansthatthesafetypropertycanberefutedbyinspectingonlya
finiteprefixofacomputation.Ontheotherhand,alivenesspropertyisnot
finitelyrefutable,i.e.,itcannotberefutedbyinspectingafiniteprefixofa
computation,ratheritisrefutedbyinspectinginfinitestatesequences.

3.5TemporalLogic
Inasequentialsystem,theinput-outputsemanticsisadequateforanalyzing
thesystem,butishoweverinadequateforconcurrentsystems.Forexample,
theinput-outputsemanticscannotadequatelycapturespecificationssuchas
“eventually(x=2)”or“never(y=3)”.
TemporalLogicisaformalismfordescribingsequencesoftransitionsbe-
tweenstatesinareactivesystems.Intemporallogic,aspecificationisa
logicalformulathatdescribesasetofcomputations.Intheworkpresented
inthisthesis,asemanticviewisadopted,wereasonaboutpropertiesof
aprogramintermsofitstransitions,ratherthanexpressingtheminany
specificationlanguage.

tRefinemen63.Aprogramcanbeviewedasaspecialtypeofspecification.Alowerlevel
specificationdiffersfromahigher-levelspecificationinthatitcontainsmore
implementationdetails.Thus,wewantlower-leveltransitionstoappearas
stutteringsteps(Def.11)inthehigherlevelspecification.
Thiscanbemodelledthroughtheconceptofprojection.

Definition23(StateProjection)Theprojectionofastatesof(alower-
levelspecification)pon(ahigher-levelspecification)pisthestateobtained
byconsideringonlythevariableofp.

25

Definition24(ComputationProjection)Theprojectionofacomputa-
tioncof(alower-levelspecification)pon(ahigher-levelspecification)pis
obtainedbytakingtheprojectionofeachstateofc(ofp)onp.
Tomodelthis,weintroduceaprojectionfunction,π,fromalower-level
specificationptoahigher-levelspecificationp.Givenastatesofprogram
p,π(s)referstothevariablesofp.Weabusethenotationbydefiningπ(α)
foraprojectionofacomputationα.Thus,πpartitionsthesetofvariablesof
p,Vp,intoasetofinternalvariables(Vi)andasetofexternalvariables(Ve).
Therefore,changestovariablesinViappearsasstutteringstepsinπ(α).
Thisleadstotheconceptofrefinement[AL91].Whenwesubstitutep
foraspecificationS,whenwesaythatacomputationcsatisfiesS(Defs.17),
wereallymeantthattheprojectionofthatcomputationπ(c)satisfiesS(i.e.,
π(c)∈S).
Refinementfromaspecificationrepresentsausefulwaytoconstructing
programs.Usingrefinement,alow-levelprogramcanbeconstructedfroma
givenspecificationthroughtheapplicationofcorrectness-preservingrefine-
ments.Witheachrefinementstep,alower-levelprogrampisobtainedfroma
higher-levelprogrampthroughtheadditionofmoreimplementationdetails.
Itisthoseimplementationdetailsthatarehiddenbyπ.

3.7FaultModelsandFaultTolerance
Thefaultsthataprogramissubjectedtocanbesystematicallyrepresented
byactionswhoseexecutionperturbsthestateoftheprogram.Suchrepresen-
tationispossibleregardlessofthetypeoffaults(stuck-at,crash,Byzantine
etc),natureofthefaults(permanent,intermittentortransient),ortheability
toobservetheeffectsofthefaults(detectableornot).
First,wedefinethetermfaultclass.
Definition25(SimpleFaultClass)Asimplefaultclassforagivenpro-
grampoveravariableviinpisasetoftransitions(actions)overthevariable
.vi

26

Definition26(FaultClass)AfaultclassFforaprogrampovervariables
v1...vninpisasetofsimplefaultclassesforpoverv1...vn.

Inthisthesis,wefocusonthesubsetoffaultmodelsthatcanpotentially
betolerated:Wedisallowfaultstoviolatethesafetyspecificationdirectly.
Forexample,ifasafetyspecificationconstrainstheoutputvariablesofa
program,thefaultmodelpreventsthefaultactionsofFtomodifytheout-
putvariablesinsuchwaythatthefaultitselfresultsinasafetyviolation.
However,faultactionscanchangetheprogramstatesuchthatsubsequent
programactionsviolatethesafetyspecification.
Thereasonforchoosingsuchafailuremodelisthatwetargettolerable
faultmodels.Ifafaultcandirectlyviolatesafety,forexample,bycorrupting
theoutputvariablesinsuchawaythatsafetycanbeviolated,thennofail-
safefault-tolerantprogramexists.Toseethis,observethatiffromstates,
afaultcancausesafetyviolation,thenthisprogramshouldnotvisitstate
s.Ifsuchfaultscanoccurineverystate,thenallsuchstatesneedtobe
madeunreachable,i.e.,theinvariantoftheprogramisanemptyset.Thus,
nofail-safefault-tolerantprogramexists,henceourfocusontolerablefault
dels.mo

Definition27(Faultmodel)AfaultmodelFforprogrampandsafety
specificationSSisafaultclassFforprogrampoveritsvariablesthatdo
notviolateSS,i.e.,iftransition(sj,sj+1)isinFands0,s1,...,sjisinSS,
thens0,s1,...,sj,sj+1isinSS.

Definition28(Computationinthepresenceoffaults)Acomputa-
tionofpinthepresenceofFisaweaklyp-fairsequenceofstatess0,s1,...
suchthats0isaninitialstateofpandforeachj≥0,sj+1resultsfromsjby
executingaprogramactionfromporafaultactionfromFandthereexists
noprogramactionacsuchthatacispermanentlyenabledbutneverexecuted.

Weaklyp-fairmeansthatonlytheactionsofparetreatedweaklyfair
(faultactionsmustnoteventuallyoccuriftheyarecontinuouslyenabled).
Wesaythatafaultoccursifafaultactionisexecuted.

27

Rephrasedinthetransitionsystemview,afaultmodeladdsasetof
transitionstothetransitionrelationofp.Wedenotethemodifiedtransition
relationbyδpF.Sincefaultactionsarenottreatedfairly,theiroccurrence
isnotmandatory.Notethatwedonotruleoutfaultsthatoccurinfinitely
often(aslongastheydonotdirectlyviolatethesafetyproperty).

FaultToleranceSpecificationsIntheabsenceoffaults,aprogramp
shouldrefineitsproblemspecification.Inthepresenceoffaultyactions,p
maynotrefineitsspecifications,butcan,ontheotherhand,refinesome
weaker“tolerancespecification”.Inthisthesis,wefocusedonfail-safefault
ance.rleto

Definition29(Fail-safefault-tolerance)LetSbeaspecificationandSS
bethesmallestsafetyspecificationincludingS,andfaultclassF.Aprogram
pissaidtobefail-safeF-tolerantforspecificationSiffallcomputationsof
pinthepresenceoffaultsFsatisfySS.

IfFisafaultmodelandSSisasafetyspecification,wesaythata
programpisF-intolerantforSSiffpsatisfiesSSintheabsenceoffaults
FbutviolatesSSinthepresenceoffaultsF.Forbrevity,wewillwrite
fault-intolerantinsteadofF-intolerantforSSifFandSSareclearfromthe
t.xetconAnoteoncriticalactionsintroducedin3.2:Criticalactionsareexactly
thoseprogramactionswhoseexecutioninthepresenceoffaultscanleadto
violationofsafety.Assuch,inembeddedapplicationssuchasthoseofplant
controllersetc,theprogramactionsthatcontrolprogresswhilemaintaining
safetyarecriticalactions.

28

4pterCha

PerfectDetectors:Basisfor
onDetectirfecteP

Nowadays,therearecomputersystemsallaroundusthatcontrolourevery-
daylives,frombeingpresentinsafety-criticalsystemssuchasairplanes,to
beingpresentinconsumer-orientedproducts,suchasautomobiles,washing
machinesetc.Especiallyfortheconsumer-orientedproducts,cost-effective
solutionsfortheprovisionofdependabilityareofparamountimportance,
leadingtothefactthatsoftware-basedfaulttoleranceisbeingprovided.
Inthisthesis,weareinterestedinprovidingefficientfail-safefaulttoler-
ance,i.e.,itisacceptableforafail-safefault-tolerantprogramtohalt,aslong
asitremainsinasafestate.Theideaistobeabletodetectwhenthepro-
gramisabouttoviolateitssafetyspecification,andhaltatthattime.Thus,
forthedetectionpart,theprogramneedstobe“upgraded”withaprogram
component,calledadetector.Intuitively,thedetectorcomponenthelpsthe
programindetectingwhen“somethingbad”isabouttohappen,suchthat
theprogramhaltstoavoiddoingthe“bad”thing,i.e.,violatesafety.
However,thedesignofefficientdetectorsisproblematic.Levesonet
al.[LCKS90]conductedalargeexperimentontheeffectivenessofselfchecks,
whichareinstancesofadetector,insoftware.Theypointedoutin[LCKS90]
that,amongothers,(i)somedetectors(self-checks)detectnon-existenter-
rors,i.e.,therearemanyfalsealarms(i.e.,falsedetections),and(ii)many
detectorsthatweredesignedwereineffective,i.e.,theydonotsignalany
error,whenthereisoneinthesystem.Inthefirstcase,theefficiencyof

29

thesystemmaydecrease,sincethesystemmayhaltprematurely,whilein
thesecondcase,thesafetyofthesystemmaybeviolated.So,weneeda
methodologyforthedesignofefficientdetectors.
Inthischapter,wefirstprovideaformaloverviewofdetectors[Kul99],
andexplaintheirroleinfail-safefault-tolerantprogram.Ourmaincontribu-
tionisthedevelopmentofanoveltheoryofdetectors,thatiscenteredaround
thenotionofaninconsistenttransition.Wefurtheridentifyaspecialclass
ofdetectors,calledperfectdetectors,andexplainitsroleinthedesignof
fail-safefaulttolerance.Specifically,weshowthatcomposingcriticalactions
ofaprogrampwithperfectdetectorsissufficientintransformingpintoa
fail-safefault-tolerantprogram.Wethenpresentanalgorithm,basedupon
thetheory,that,givenafault-intolerantprogrampwithsafetyspecification
SS,andafaultclassF,generatesafail-safefault-tolerantprogramp,which
isthefail-safefault-tolerantversionofp.Themainpropertyofperfectdetec-
torsisthattheydetecterrorsifandonlyiftheseerrorsmayleadtoviolation
ofsafety.Thus,perfectdetectorscanbeshowntoaddressthetwoproblems
identifiedbyLevesonetal.
In[JHCS02,JHS02b,JHS02a],asetofperfectdetectorswasinitially
referredtoasSS-globallyconsistentdetectors.ASS-consistentdetectoris
onethatdetectsanerrorifandonlyiftheerrorcanleadtoviolationofsafety.
AsetofsuchSS-consistentdetectorsissaidtobeSS-globallyconsistent.

30

4.1Introduction

Safety-criticalapplicationsneedtosatisfystringentdependabilityrequire-
mentsintheirprovisionofservices.Unlesssounddesignmethodsareused
tosynthesizesuchapplications,theprocessofdesigningsafety-criticalappli-
cationsislikelytobeacomplexone.Toreducethecomplexityofdesigning
suchapplications,AroraandKulkarni[AK98a]haveproposedatransforma-
tionalapproach,wherebyaninitiallyfault-intolerantprogramissystemati-
callytransformedintoafault-tolerantone.Themainstepinvolvedindesign-
ingafault-tolerantprogramiscomposingthecorrespondingfault-intolerant
programwithcomponentsthat(i)detectand/or(ii)correcterrorsthatarise
asaresultoffaults,dependingontheleveloffault-tolerancetobeachieved.
Theclassofprogramsthatachievesthefirstgoalistermeddetectorswhilethe
classofprogramsthatachievesthesecondgoaliscalledcorrectors[AK98c].
Werestrictourattentiontodesigningfail-safefault-tolerance.Intuitively
thismeansthatitisacceptablethattheprogram“halts”iffaultsoccuras
longasitalwaysremainsina“safe”state.Thistypeoffault-toleranceis
oftenusedin(nuclear)powerplantsortraincontrolsystemswheresafety
(avoidanceofcatastrophicevents)ismoreimportantthatcontinuouspro-
visionofservice.InthecontextoftheArora/Kulkarniapproach,fail-safe
fault-tolerancecanbeachievedbymerelyemployingdetectors.
Generally,detectorscanberegardedasanabstractionofmanydifferent
existingfault-tolerancemechanisms.Forexample,acommonwaytoachieve
fault-toleranceistoreplicateacriticaltaskandscheduleitondifferentpro-
cessors.Theoutputsofthesetasksarebroughttogetherinavoterwhich
outputsaconsistentvalue.Thevotercontainsacomparatorwhichisan
instanceofadetector.Another(maybemoreobvious)exampleofadetector
istheuseoferrordetectingcodes.Othererrorhandlingmechanismslike
acceptancetestsorexecutableassertionscanalsobeformulatedasdetectors
inthesenseofAroraandKulkarni[AK98c].Hence,reasoningonthelevelof
detectorsmakesanapproachapplicabletomanydifferentpracticalsettings.
Inthischapter,wepresentasound,andcompletealgorithmfortrans-
forminganinitiallyfault-intolerantprogrampintoanefficientfail-safefault-

31

tolerantprogramp.Thealgorithmbeingsoundandcomplete,meaningthat
(i)thetransformedprogrampisinfactafail-safefault-tolerantversionof
theoriginalprogramp(soundness),and(ii)ifafail-safefault-tolerantver-
sionoftheprogramexists,thenthealgorithmwillfindit(completeness).By
efficient,wemeanthatthefail-safefault-tolerantdetecterrorsifandonly
iferrorsleadtoviolationofsafety,thusaddressingsomeoftheproblems
identifiedbyLevesonetalin[LCKS90],i.e.,phasperfecterrordetection.
Overall,ourapproachisapplicabletoaclassofprograms,calledbounded
programs.Thepropertyofboundedprogramsisthatthereisnounbounded
loopwithinoracrossprocesses.Embeddedapplicationsareofteninstancesof
boundedprograms.Distributedalgorithmssuchasmutualexclusion,byzan-
tineagreementetc.arealsoinstancesofboundedprograms.
Ouralgorithmisderivedoutofarefinedtheoryofdetectors.Thistheory
developsaterminologywhichcapturesandexplainstheworkingprinciples
ofdetectorsbetterthanbefore.Thebasicbuildingblockofthetheoryisthe
notionofatransitionwhichisinconsistentwithrespecttoasafetyspecifi-
cation[Lam77].Thiscanbeunderstoodasfollows:Executingatransition
inconsistentwithrespecttothesafetyspecificationcanleadtoaviolationof
thesafetyspecificationifnocountermeasuresaretaken.
Buildinguponthisconcept,wedevelopatheoryofaccurate,complete,
andperfectdetectorstogetherwiththenecessarycorrectnesstheorems.Intu-
itively,adetectorisaccurateifit“preserves”correctbehaviorsofthesystem
inthepresenceoffaults.Adetectoriscompleteifit“rejects”incorrectbe-
haviorsinthepresenceoffault.Adetectorisperfectifitisaccurateand
complete.Inthischapter,wemakethefollowingcontributions:

•Wefirstpresentanoveltheoryofdetectorswhichaccuratelycaptures
theworkingprinciplesofdetectors.

•Weidentifyaclassofdetectors,calledperfectdetectors,andexplain
theirrole,andimportanceinfail-safefault-tolerance.

•Basedonthistheory,weprovideanalgorithmthatsystematically

32

transformsafault-intolerantprogramintoafail-safefault-tolerantpro-
gramwithperfectdetection.
Thechapterisstructuredasfollows:Section4.2providesanoverviewof
detectorsandtheirroleinestablishingfail-safefaulttolerance.Section4.3
definestheproblemofaddingfail-safefault-toleranceusingdetectors.Sec-
tion4.4developsthetheoryofperfectdetectors.InSection4.5,wepresent
thealgorithmthatautomaticallygeneratesafail-safefault-tolerantprogram
fromthecorrespondingfault-intolerantprogramwithperfectdetectioncapa-
bilities.SomeexamplesarepresentedinSection4.6.Weconcludethepaper
4.7.nctioSein

4.2AnOverviewofDetectors
Inthissection,wepresentabriefintroductionofadetectorcomponent.For
acompleteformalization,wereferthereadertoKulkarni[Kul99].
Adetectormoduledisaprogramcomponentthatisusedtocheck
whetheritsdetectionpredicateDis“True”,whereDisastatepredicate.
Specifically,adetectordcanbeoftheform
¬Z∧D→Z:=True.
ItmeansthatifthedetectionpredicateDis“True”,thenZ,thewitness
predicate,becomes“True”.Thedetectorcomponentneedstosatisfythree
s:ieertppro,Safeness1.and,ressProg2.ybilitSta3.Bysafeness,wemeanthatthedetectorneverallowsZtowitnessDin-
correctly.ProgressmeansthatifDiscontinuously“True”,Zwilleventually
bebecome“True”.StabilitymeansthatonceZbecomes“True”,itcontin-
uestobeunlessDbecomes“False”.Examplesofdetectorsintheliterature

33

abound,suchaserrordetectioncodes,executableassertions[Hil00],com-
parators,andsoon.However,ifthedetectionpredicateissuchthatitisnot
relatedtothesafetyspecificationoftheprogram,thentheerrordetection
processwillnotbeefficient.Hence,todesign“relevant”detectors,theyneed
torelatetothespecificationoftheprogram.Inthenextsection,weexplain
theirroleinfail-safefaulttoleranceandrelateittotheirdesign.

4.2.1RoleofDetectorsinFail-SafeFaultTolerance
WeadopttheviewofAroraandKulkarni[AK98c]thatafault-tolerantpro-
gramisthecompositionofacorrespondingfault-intolerantprogramwith
faulttolerancecomponents.Usingthesamesystemmodelasusedinthis
work,AroraandKulkarniprovedthatdetectorsarenecessaryandsufficient
toestablishfail-safefaulttolerance.Intuitively,adetectordetectswhether
agivenstate(detection)predicateissatisfiedinagivenstate.Instancesof
detectorscanbeexecutableassertions,errordetectioncodes,selfchecks,and
ors.ratcompaGivenourfocusonfail-safefaulttolerance,wereviewtheresultofArora
andKulkarni[AK98c]statingthatdetectorsarenecessaryandsufficientto
buildfail-safefault-tolerantapplications.Themainideaoftheresultistouse
detectorstosimply“halt”theprograminastatewhereitisabouttoviolate
thesafetyspecification.AnimportantprerequisitefortheArora/Kulkarni
sufficiencyresultisthatspecificationsarefusion-closed.Fusion-closedspec-
ifications(Def.16)allowtocharacterizeasafetyspecificationasasetof
disallowed“bad”transitions(insteadofasetofdisallowedcomputationpre-
fixes).Proposition2LetSSbeasafetyspecification,panF-intolerantprogram
forSSforfaultclassF.IfpviolatesSSthenthereexistsatransitiont∈δp
suchthatforallcomputationsσofpholds:Iftoccursinσthenσ∈SS.
Proof.SincepviolatesSS,thereexistsacomputationσwhichisnotin
SS.ThefactthatSSisasafetypropertyimpliesthatσcontainsaminimal
prefix,writtenα∙s∙s,whichdoesnotmaintainSS(i.e.,whichpreventsthe
computationfrombeinginSS).Thisprefixhasatleastlength2becauseall

34

initialstatesofpmaintainSS.Wemustnowshowthatif(s,s)occursin
anyothercomputationρofp,thenρ∈SS:

1.Foracontradiction,assumeρ=αˆ∙s∙s∙βˆ∈SS.Wewillshowthat
α∙s∙smaintainsSS.
2.SinceSSisasafetypropertyandρ∈SS(step1),allprefixesofρ
.SStainmain3.Fromstep2andbecauseitisaprefixofρ,computationαˆ∙s∙smaintains
.SS4.Fromstep3anddefinitionofmaintains:∃δˆ:αˆ∙s∙s∙δˆ∈SS.
5.Fromassumptionα∙smaintainsSS,sofromdefinitionofmaintains
wehave:∃δ:α∙s∙δ∈SS.
6.Becauseoffusion-closureofSSandthesteps4and5construct:α∙s∙
s∙δˆ∈SS.
7.Step6meansthatα∙s∙smaintainsSS,whichisacontradictionto
thefactthatα∙s∙sdoesnotmaintainSS.

WecallthetransitionsidentifiedinProposition2badtransitions.In-
tuitively,tomaintainasafetyspecificationnowrequirestokeeptrackof
thecurrentcomputationandtakeprecautionsnottorunintooneofthe
badtransitionswhicharedisallowedbythesafetyspecification.Thesafety
specificationofaprogramcanthusbeconciselyrepresentedasasetofbad
transitions.Notethat,inthiswork,weassumethatthesafetyspecification
isprovidedassuch,i.e.,thesmallestspecificationthatcontainsthespeci-
fication.Ifthisisnotthecase,andifthespecificationoftheprogramis
expressedasaformulaintemporallogic,thesetofbadtransitionscanbe
obtainedinpolynomialtime,byconsideringalltransitions(s,t):s,t∈Sp.
Fromourrestrictionsofthefaultmodel(Chapter3,Section3.7)(fault
transitionscannotdirectlyviolatesafety)weknowthatbadtransitionsmust
beprogramtransitions(alsofromProposition2).Adetectorrefinesthe
guardofthecorrespondingactioninsuchawaythattheactionisnever
executedwheneverthecomputationcouldresultintakingabadtransition.

35

Formally,adetectorforanactionimplementsastatepredicatedwhichis
“True”iffexecutionoftheactionstartingindmaintainsthespecification.
Intheprogrammingnotation,givenanactiong→st,adetectorforthis
actionrefinestheguardtog∧d.AroraandKulkarniformulatethisfactin
theiroriginalworkasfollows[AK98a,Theorem4.3]:

Theorem1(Sufficiencyofdetectors)Foreachactionacofpthereex-
istsapredicatedsuchthatexecutionofacinastatewheredholdsmaintains
SS.

Definition30(Detectorforanaction)LetSSbeasafetyspecification.
AnSS-detectordmonitoringprogramactionacofpisastatepredicateof
pwhichisguaranteedtoexistaccordingtoTheorem1.

WewillsimplytalkaboutdetectorsinsteadofSS-detectorsiftherelevant
safetyspecificationisclearfromthecontext.Takentogether,Theorem1
andDefinition30statethatitissufficienttocomposeagivenactionwith
arelevantdetector,whichisguaranteedtoexist,toensurethattheaction
.safelyexecutesConsiderthetransitionsystemviewofaprogrampagain.Wedefinethe
notionsofreachable/unreachablestates/transitionsinthepresence/absence
offaults[GV00,GV01].

Definition31(Reachablestate)Wesaythatastatesisreachablebyp
iffstartingfromaninitialstateofpitispossibletoconstructacomputation
whichcontainssusingonlytransitionsfromδp.Otherwisesisunreachable.

Definition32(Reachabletransition)Atransition(s,t)ofpisreachable
iffstatesisreachablebyp.Otherwiseitisunreachable.

Definition33(Reachablestateinthepresenceoffaults)Wesay
thatastatesisreachablebypinthepresenceoffaultsFiffstartingfrom
aninitialstateofpitispossibletoconstructacomputationwhichcontains
susingonlytransitionsfromδpF.Otherwisesisunreachableinthepresence
faults.of

36

Definition34(Reachabletransitioninthepresenceoffaults)We
saythatatransition(s,t)isreachablebypinthepresenceoffaultsiffsis
reachablebypinthepresenceoffaults.Otherwise,(s,t)isunreachablein
thepresenceoffaults.
Fig.4.1illustratestheconceptsofreachablestates/transitionsintheab-
sence/presenceoffaults.

Figure4.1:Reachablestates/transitions

Observethat,startingfromaninitialstate,intheabsenceoffaults,all
computationsofpsatisfythesafetyspecificationSS.Thus,thecomputations
ofpgothroughthosetransitions(states)ofpthatarereachableintheabsence
offaults..However,inthepresenceoffaults,sometransitions(states)which
wereunreachableintheabsenceoffaults,nowbecomereachable.Using
theaboveterminology,detectorsremovesomeoftheprogramtransitions
whichwereunreachablebypintheabsenceoffaults,butbecomereachable
inthepresenceoffaults.Inasense,composingaprogramwithdetectors
meanstorefinetheoriginaltransitionrelationandeliminatecertainprogram
transitionssoastomakebadtransitionsunreachable.
Weclosethissectionwithafinalremarkregardingtheassumptionthat
specificationsbefusion-closed.Informallyspoken,fusion-closureguarantees
thattheentirehistoryofacomputation“isavailable”inthecurrentstateof
thesystem,i.e.,itissufficienttoobservethecurrentsystemstatetoknow
whetherthenextstepwillresultinadisallowedprefix.Ithasbeenobserved
[Gum93,AK98a]thatspecificationsinthepopularUnitylogic[CM88]are
fusion-closed,asarelow-levelspecificationslikeCprogramsortransition

37

systems.Ingeneralaspecificationthatisnotfusion-closedcanbeconverted
intoafusion-closedspecificationthroughtheadditionofhistoryvariables.
Howthiscanbedoneinawaythatminimizesthenumberofadditional
statesremainsatopicforfurtherresearch.
Inthissection,wehaveprovidedanoverviewofdetectorsandtheirrole
andimportanceinthedesignoffail-safefaulttolerance.However,littleis
knownaboutwhetherthedetectorsdesignedareefficientornot.Toaddress
thisproblem,wewillfirstdefinethetransformationproblemoffail-safefault
tolerance(Section4.3).Wewillthendevelopatheorythatunderpinsan
algorithmthatsolvesthetransformationproblem.

4.3TheTransformationProblem

Inthissection,wewillformallystatetheproblemoftransformingafault-
intolerantprogrampintoafail-safefault-tolerantversionpforagivensafety
specificationSSandfaultmodelF.
Whenderivingpfromp,onlyfaulttoleranceshouldbeadded,i.e.,p
shouldnotsatisfySSinnewwaysintheabsenceoffaults.Specifically,
therearetwoconditionstobesatisfiedinthetransformationproblem:
•Ifthereexistsatransition(s,t)inpthatisnotreachedbyptosatisfy
SS,then(s,t)cannotbeusedbyp,sincethismeansthatthereare
otherwayspcansatisfySSintheabsenceoffaults.Thus,thesetof
transitionsofpmustbeasubsetofthesetoftransitionsofp.
•Also,ifthereexistsastatesreachablebypintheabsenceoffaults
thatisnotreachedbypintheabsenceoffaults,thenthismeansthat
pcansatisfySSdifferentlyfrompintheabsenceoffaults,andsuch
astatesshouldnotbereachedbypintheabsenceoffaults.Thus,
thesetofstatesreachablebypshouldbeasubsetofthesetofstates
reachablebyp.
Ingeneral,theseconditionsresultintherequirementthatbothprograms
shouldhavethesamesetoffault-freecomputations.Formally,wedefinethe
transformationproblemasfollows:

38

Definition35(Transformationforfail-safefaulttolerance)LetSS
beasafetyspecification,afaultmodelF,andpanF-intolerantprogramfor
SS.Identifyaprogrampsuchthatthefollowingthreeconditionshold:
1.psatisfiesSSinpresenceofF.
2.Intheabsenceoffaults,everycomputationofpisacomputationofp.
3.Intheabsenceoffaults,everycomputationofpisacomputationofp.

Thetransformationproblemcanalsobeformulatedasadecisionproblem:

Definition36(Decisionproblemforthetransformation)LetSSbe
asafetyspecification,afaultmodelF,andpanF-intolerantprogramfor
SS.Doesthereexistaprogrampsuchthatthefollowingthreeconditions
hold:1.psatisfiesSSinpresenceofF.
2.Intheabsenceoffaults,everycomputationofpisacomputationofp.
3.Intheabsenceoffaults,everycomputationofpisacomputationofp.

LaterinSection4.5wepresentasound,andcompletealgorithmwhich
solvestheabovetransformationproblem,i.e.,wepresentanalgorithmthat
systematicallytransformsafault-intolerantprogramintoaprogramthatsat-
isfiestheabovethreeconditions.Soundnessofthealgorithmmeansthatthe
resultingprogramindeedsolvesthetransformationproblem.Completeness
ofthealgorithmmeansthatifthesolutiontothedecisionproblemistrue,
thenthealgorithmwillfindthefail-safefault-tolerantprogram.
Thealgorithmisbasedonatheoryofdetectorswhichweintroducein
thefollowingsection.

4.4ATheoryofPerfectDetectors
Thissectionpresentsatheoryofdetectorcomponentswhichhelpsinthe
designofefficientfail-safefault-tolerantapplications.Thetheoryiscentered
aroundthenotionofanSS-inconsistenttransitionwhichisintroducedin
Section4.4.1.Usingthisnotion,weidentifycorrectnesscriteriaforprograms

39

composedwithso-calledperfectdetectorsinSection4.4.2.Ouralgorithmto
addfail-safefaulttolerancepresentedinSection4.5directlyfollowsfromthe
theorypresentednow.
4.4.1TransitionConsistencyintheContextofSafety
ionsficatcieSpTheintuitionbehindthedefinitionoftransitioninconsistencyisthatifa
givencomputationviolatesthesafetyspecification,thensome“erroneous”
transitionoccurredinthecomputation,i.e.,thattransitionisinconsistent
withthesafetyspecificationoftheprogram.Specifically,considerafault-
intolerantprogrampwithsafetyspecificationSS,andacomputationαthat
violatesSS.FromPropositions1and2weknowthatthereexistsaprefixσ
ofαthatcontainsabadtransition.

Figure4.2:Anexampletoillustratetheconceptofinconsistenttransition

Whenacomputationviolatessafety,intuitivelyitmeansthattheprogram
isona“wrongpath”,andsuchdeviationhashappenedearlier.Thisintuition
iscapturedbytheSS-inconsistencyconcept,asdefinedbelow.
Definition37(SS-inconsistenttransitions)Givenafault-intolerant
programpwithsafetyspecificationSS,andacomputationαofpinthe
presenceoffaults.Atransition(s,s)isSS-inconsistentforpwithrespect
toαinpresenceoffaultsFiff
40

•ThereexistsaprefixαofαsuchthatαviolatesSS
•(s,s)occursinα,i.e.,α=σ∙s∙s∙β,
•Alltransitionsins∙s∙βareinδp,
•σ∙smaintainsSS.
Wenowillustratethisconceptpictorially.FromFig.4.2,transition(7,8)
isSS-inconsistentforpwithrespecttocomputationα=1∙7∙8∙9....α
violatesSSsinceitcontainsabadtransition,i.e.,(10,11).Observethat
transitions(8,9),and(9,10)arealsoSS-inconsistentforpwithrespecttoa
n.putatiocomengiv1PamProgrvarxinit1,yinit1,zinit10,cinit1:int
c=1→x:=read();c:=c+1;//valuebetween5and10
c=2→y:=read();c:=c+1;//valuebetween5and15
c=3→z:=x+y;c:=c+1
c=4→output(z);c:=1//loopforever
aults):(fFtrue→x:=random[0...25]
true→y:=random[0...50]

Figure4.3:ProgramtoillustratetheconceptofSS-inconsistenttransitions.
Wenowillustratethisdefinition:ConsidertheprogramP1inFigure4.3
whichreadstwosensors,andthenoutputsthesumofthetworeadings.
ThesafetyspecificationSSrequirestheoutputtobealwaysbetween10
and25.Thefaulttransitionsindicatethat,fromeachstate,thevalueof
variablex(respectively,y)canbearbitrarilychangedtoavalueintherange
of[0...25](respectively,[0...50]).Considernowcomputationα(statesare
givenastriplesx,y,z,i.e.,theprogramcountercisnotexplicitlygiven):
α:1,1,10∙10,1,10∙10,5,10∙10,5,15

41

Obviously,αsatisfiesSSandsonoprogramtransitionisSS-inconsistent.
NowconsidercomputationβwhichviolatesSS:
β:1,1,10∙10,1,10∙25,1,10∙25,5,10∙25,5,30
Inβ,afaulttransitionoccursafterthesecondstate,i.e.,state10,1,10,
changingthevalueofxto25.Thesubsequentprogramtransitionfrom
25,1,10to25,5,10isSS-inconsistent,sincetheexecutionofthefollow-
ingprogramtransitiontostate25,5,30causesaviolationofthesafety
specification.Theprogramtransitionfrom25,5,10to25,5,30isalso
SS-inconsistent.Thefirstprogramtransitionandthefaulttransitionare
notSS-inconsistent.
Intuitively,anSS-inconsistenttransitionforagivenprogramcomputa-
tionisaprogramtransitionwherethesubsequentexecutionofasequenceof
programactionscausesthecomputationtoviolatethesafetyspecification.
Inasense,SS-inconsistenttransitionsleadtheprogramcomputationonthe
“wrongpath”.Therequirementofasequenceofprogramtransitionsinthe
prefixistocapturethefactthatnoprecautionisbeingtaken,andthein-
consistenttransitioncapturesthefactthatsomethingharmfulhasoccurred.
NowwedefineSS-inconsistencyindependentofaparticularcomputation.
Thiscapturesthefactthat,startingfromsuchatransition,itispossibleto
violatesafety,i.e.,ifsuchatransitionoccursduringacomputation,then
thereisachancethatthiscomputationwillviolatesafety.
Definition38(SS-inconsistenttransitionforp)Givenaprogramp
withsafetyspecificationSS.Atransition(s,s)isSS-inconsistentforpin
presenceoffaultsFiffthereexistsacomputationαofpinthepresenceof
faultsFsuchthat(s,s)isSS-inconsistentforpw.r.t.αinpresenceofF.
Intuitively,atransition(s,s)isSS-inconsistentforaprogrampifthe
transitionstartsleadingthecomputationonthewrongpath.FromFig.4.2,
transition(7,8)isSS-inconsistentforpsinceithastakenthecomputation
onapossiblewrongpath,i.e.,therecansubsequentlybesafetyviolation.
Ingeneral,atransitioncanbeSS-inconsistentw.r.t.acomputationα1,
andnotbeSS-inconsistentw.r.t.α2.Thiscanbeduetonondeterminismin
42

programexecution.ToseethisconsidertheprogramP2inFigure4.4.The
safetyspecificationSSmandatesthat10≤d≤50atalltimes.Consider
nowthefollowingcomputationα1ofP2(astateisgivenasw,x,y,z):
α1=1,5,1,10∙1,10,1,10∙1,45,1,10∙15,45,1,10∙15,45,15,10∙15,45,15,60
Inthesecondstateafaultoccurssettingxto45andeffectivelycausingα1
toviolateSSafterexecutionofasequenceofprogramtransitions.Notice
thatthetransitiont=(1,45,1,10,15,45,1,10)isSS-inconsistentforp
.αw.r.t.1Nowconsidercomputationα2ofp:
α2=1,5,1,10∙1,10,1,10∙1,45,1,10∙15,45,1,10∙0,45,1,10∙0,45,0,10∙0,45,0,45
Hereagainafaulthappensinthesecondstatebutduetoaluckyinterleaving
ofprogramactionsα2maintainsSS.Hence,thesameprogramtransitiont
asaboveisnotSS-inconsistentforpwithrespecttoα2.
ProgrvaramwPin2it1,c1init1:int//processa
varxinit5,yinit1,zinit10,c2init1:int//processb
procce1ss=1a:→w:=read();c1:=c1+1;//valuebetween15and25
cc11==22∧∧xx>≤1515→→ww:=:=ww−+5;15;cc11:=:=1;1;////loloopop
processb:
cc22==12→→xy:=:=wr;ecad2();:=cc22:=+c1;2+1;//valuebetween0and20
cc22==43→→zout:=puyt(+zx);;cc22:=:=c1;2//+1;loop
aults):(fFtrutruee→→wx:=:=rrandandomom[1[10......50]45]

Figure4.4:Programcontainingtwoconcurrentprocesseswithatransition
thatisbothSS-inconsistentandnotSS-inconsistentw.r.t.twodifferentcom-
tions.puta

43

Ifwecannotfindacomputationinthepresenceoffaultsforwhicha
particulartransitionisSS-inconsistentthenwesaythatthistransitionis
SS-consistent.Specifically,

Definition39(SS-consistenttransitionforp)Givenaprogrampwith
safetyspecificationSS.Atransition(s,s)isSS-consistentforpinpresence
offaultsFiff(s,s)isnotSS-inconsistentforpinpresenceofF.

Forexample,fromFig.4.2,transition(1,2)isSS-consistentforp.Tran-
sition(13,14)isalsoSS-consistentforp.ThenotionofSS-consistenttransi-
tioncapturesthefactthatexecutingsuchatransitionisinherentlysafe,i.e.,
thereisnochanceofsafetybeingviolatedunlesssomethingharmfuloccurs.
ThenotionofanSS-inconsistenttransitionisacharacteristicofacom-
putationthatviolatesSS,andiscapturedbythefollowingproposition
).3p.(Pro

Proposition3Givenanfault-intolerantprogrampwithasafetyspecifica-
tionSS.EverycomputationαofpinthepresenceoffaultsthatviolatesSS
containsanSS-inconsistenttransitionforpw.r.t.αinpresenceofF.

f.oPro1.BecausepisF-intolerant,thereexistsacomputationαofpinthe
presenceoffaultssuchthatα∈SS.
2.Fromstep1andProposition2thereexistsabadtransition(s,s)inα.
3.Fromstep2andtherestrictionofFfollowsthat(s,s)∈δp.
4.Fromstep3andDefinition37,(s,s)isSS-inconsistentforpw.r.t.α.
Earlier,wehavecharacterizedinconsistenttransitionsbytheirabilityof
causingcomputationstoviolatesafety.Sinceabadtransitionisreachable
onlyinthepresenceoffaults,inconsistenttransitionscanalsobecharacter-
izedthroughthereachabilityofbadtransitions.

44

Proposition4Givenafault-intolerantprogrampwithasafetyspecification
SS.If(s,s)isanSS-inconsistenttransitionforpinthepresenceoffaults
F,thenabadtransitionisreachablestartingfromsusingonlyprogram
transitionsfromδp.
Proof.TheprooffollowsdirectlyfromthedefinitionofSS-inconsistent
transitionsandProposition2.
Reachabilityofbadtransitionsinδpleadstothefollowingobservation.
Proposition5Givenafault-intolerantprogrampforsafetyspecification
SS.EverySS-inconsistenttransitionforpinpresenceoffaultsFisnot
reachableintheabsenceoffaultsF.

f.oPro1.Foracontradiction,assumethestartstatesofanSS-inconsistent
transition(s,s)isreachableintheabsenceoffaults.
2.Step1impliesthatthereexistsacomputationα∙s∙sofpintheabsence
faults.of3.Fromthefactthat(s,s)isinconsistent,andProposition4thereexists
acomputations∙s∙βofpintheabsenceoffaultsinwhichabad
ccurs.onsitiontra4.Fromsteps2and3followsthatthereexistsacomputationσ=α∙s∙s∙β
ofpintheabsenceoffaultscontainingabadtransition.
5.Fromstep4andProposition2thereexistsacomputationofpinthe
absenceoffaultswhichviolatesSS.
6.Fromstep5pviolatesSSintheabsenceoffaults,acontradiction.
Notethatthepreviousobservationcannotbestrengthenedtoanequiv-
alence(anon-reachabletransitionintheabsenceoffaultsmustnotbeSS-
inconsistent).Butitcanbereformulatedtocharacterizereachabletransi-
tionsintheabsenceoffaultsasSS-consistent.

45

Corollary1Givenafault-intolerantprogrampforasafetyspecification
SS.Everyreachabletransition(s,s)∈δpintheabsenceoffaultsFisSS-
consistentforpinthepresenceoffaultsF.
Inthenextsection,weintroducethenotionofperfectdetectorsusingthe
terminologyofSS-(in)consistency.

4.4.2PerfectDetectors
Fromtheprevioussection,weobservedthatSS-inconsistenttransitionsare
thosetransitionsthatcanleadaprogramtoviolateitssafetyspecificationin
thepresenceoffaults,ifnoprecautionsaretaken.Detectors,asweexplained
inSection4.2,areameanstoimplementtheseprecautions.However,as
pointedoutbyLevesonetal.in[LCKS90],designofefficientdetectorsis
inherentlycomplex.Hence,weintroducetheclassofperfectdetectors.
Perfectdetectorsareameanstoefficientlyimplementtheseprecautions.
Thedefinitionofperfectdetectorsfollowstwodesignprinciples:A(perfect)
detectordmonitoringagivenactionacofprogrampneedsto(1)“reject”the
startingstatesofalltransitionsinducedbyacthatareSS-inconsistentfor
pinthepresenceoffaults,and(2)“keep”thestartingstatesofallinduced
transitionsthatareSS-consistentforpinthepresenceoffaults.Thesetwo
propertiesarecapturedinthedefinitionofcompletenessandaccuracyof
detectors(thenotionsaredefinedinanalogytoChandraandToueg[CT96]).
Definition40(Detectoraccuracy)Givenaprogrampwithsafetyspec-
ificationSS,andaprogramactionacofp.Adetectordmonitoringacis
SS-accurateforacinpinthepresenceoffaultsFiffforalltransitions(s,s)
inducedbyacholds:if(s,s)isSS-consistentforpinthepresenceofF,then
.d∈sTheaccuracypropertycapturesthefactthatefficientdetectorsshould
notmakemistakes.Thus,ifadetectordetectsthatatransitionissafe,then
it“accepts”thestate.
Definition41(Detectorcompleteness)Givenaprogrampwithsafety
specificationSS,andaprogramactionacofp.Adetectordmonitoring

46

actionacisSS-completeforacinpinthepresenceoffaultsFiffforall
transitions(s,s)inducedbyacholds:if(s,s)isSS-inconsistentforpin
presenceofF,thens∈d.

Ontheotherhand,thecompletenesspropertycapturesthenotionthat
adetectorshould“reject”allharmfultransitions.

Definition42(Perfectdetector)Givenaprogrampwithsafetyspecifi-
cationSS,andaprogramactionacofp.Adetectordmonitoringacis
SS-perfectforacinpinpresenceoffaultsFiffdisbothSS-completeand
SS-accurateforacinpinpresenceofF.

Wherethespecificationisclearfromthecontextwewillwriteaccuracy
insteadofSS-accuracy(thesameholdsforcompletenessandperfection).
Overall,theperfectnesspropertyofdetectorscapturesthefactthatsucha
detectorsdetectallharmfulfaults,anddonotmakemistakes.
Intuitively,thecompletenesspropertyofadetectorisrelatedtothesafety
propertyoftheprogrampinthesensethatthedetectorshouldfilteroutall
“harmful”SS-inconsistenttransitionsforp,whereastheaccuracyproperty
relatestothelivenessspecificationofpinthesensethatthedetectorshould
notruleoutSS-consistenttransitions.Thisintuitioniscapturedbythefol-
lowinglemmas.Thefirstone(Lemma1)usestheaccuracypropertytoshow
thatthefaultfreebehaviorofaprogramisnotaffectedbyaddingperfect
detectors.Intuitively,thisalsomeansthat,intheabsenceoffaults,addition
ofperfectdetectorstoaprogramdoesnotcausetheoriginalprogramtolose
anyofitsbehavior.Thenextone(Lemma2)usesthecompletenessproperty
toshowthatperfectdetectorsindeedestablishfail-safefault-tolerance.In-
tuitively,thisalsomeansthatthesedetectorsareefficient,inthesensethat
theydonotmakemistakes,andtheyalsocause“rejection”ofall“harmful”
transitions.Jhumkaetal.introducedtheconceptofSS-globallyconsis-
tentdetectorsin[JHCS02].Asmentionedin[JHS03],asetof(SS-)perfect
detectorsfordifferentactionsinprogrampwithsafetyspecificationSSis
SS-globallyconsistentforp.

47

Lemma1(Perfectdetectorsandfault-freebehavior)Givenafault-
intolerantprogrampandasetDofperfectdetectors,considerprogramp
resultingfromthecompositionofpandD.Thenthefollowingstatements
hold:1.Intheabsenceoffaults,everycomputationofpisacomputationofp.
2.Intheabsenceoffaults,everycomputationofpisacomputationofp.

f.oPro1.FromCorollary1,everyprogramtransitionwhichisreachableinpis
SS-consistent.
2.Fromconstruction,presultsfromaddingperfectdetectorstop.Be-
causetheyareperfect(Definition42),theyareaccurate.
3.Fromsteps1,2andthedefinitionofaccuracy,allSS-consistenttran-
sitionsofparealsotransitionsofp.
4.Steps1and3implythateveryreachabletransitioninpisalsoreachable
.pin5.Step4impliesthateverycomputationofpisalsoacomputationofp,
provingthefirstclaimofthelemma.
6.Fromthedefinitionofadetector(Definition30)followsthatcomposi-
tionwithdetectorsdoesnotintroducenewstatetransitions.
7.Step6impliesthatδp⊆δp.
8.Step7impliesthateverycomputationofpisalsoacomputationofp,
provingthesecondclaimofthelemma.
Lemma1intuitivelysuggeststhat,intheabsenceoffaults,programp,
anditscorrespondingfail-safefault-tolerantprogramhaveidenticalbehav-
iors.Whatitalsosuggestsisthatanyotherdetectorthatisdesigneddefen-
sively(defensiveprogramming)interfereswiththebehaviorofthefail-safe

48

fault-tolerantprogramintheabsenceoffaults.Specifically,itmeansthat
thereexistsvalid(SS-consistent)transitionsthatarehoweverruledoutby
thedetector,hencelivenessiscompromised.
Tounderstandthebehaviorofaprograminthepresenceoffaults,we
makeuseofthenotionsofcriticalactions,whichweformalizedhere.Intu-
itively,acriticalactionisonewhichifexecutedinanerroneousstatewill
causeviolationofsafety.
Definition43(Criticalandnon-criticalactions)Givenaprogramp
withsafetyspecificationSS,andfaultmodelF.Anactionacofpissaidto
becriticaliffthereexistsatransition(s,t)inducedbyacsuchthat(s,t)isa
badtransition(Proposition2)thatisreachableinpresenceoffaultsF.An
actionisnon-criticaliffitisnotcritical.
Thus,thesetofbadtransitionsreachableinthepresenceoffaultsdefine
asetofcriticalactions.
Lemma2(Perfectdetectorsandbehaviorinthepresenceoffaults)
Givenafault-intolerantprogrampforasafetyspecificationSS.Givenalso
aprogrampbycomposingthecriticalactionsofpwithperfectdetectors.
Then,psatisfiesSSinpresenceoffaults.
f.oPro1.ForacontradictionassumethatpviolatesSS.Fromdefinitionof
violatesfollowsthatthereexistsacomputationσofpwhichisnotin
.SS2.Step1andProposition2implythatthereabadtransition(s,s)occurs
.σin3.Becauseoftherestrictionsonthefaultmodel(criticalvariablesarenot
affected),thetransition(s,s)fromstep2mustbeaprogramtransition
(i.e.,(s,s)∈δp)
4.Fromstep3,andDefinition43,thereexistsacriticalactionacthat
inducesthebadtransitionfromstep3
49

5.FromDefinition37andstep3thetransition(s,s)isSS-inconsistent.
6.Considerthecriticalprogramactionac(fromstep4)causingthebad
transition.Fromconstructionofp,aciscomposedwithaperfect
.ddetector7.Fromstep5andbecausedisperfect,itisalsocomplete.
8.Becausediscomplete(step6),dmonitorsac(step5)andtransition
(s,s)inducedbyacisSS-inconsistent(step4),thedefinitionofcom-
pletenessimpliesthats∈d.
9.Step7impliesthat(s,s)∈δpwhichcontradictsstep3.

Thus,Lemma2showsthatperfectdetectorsforcriticalactionsaresuf-
ficientfordesignoffail-safefault-tolerantprogram.Overall,composingthe
criticalactionsofafault-intolerantprogramp(resultinginp)withperfect
detectorsensuresthat(i)intheabsenceoffaults,pandphaveidentical
behavior,and(ii)inpresenceoffaults,pisfail-safefault-tolerant(From
.2)and1Lemmas

Lemma3(PerfectDetectionandSafetySpecification)Givena
fault-intolerantprogrampwithsafetyspecificationSS,whichisencodedas
asetofbadtransitionsss,andafaultclassF.Givenalsoaprogramp,
suchthatp=p\ssr,wheressristhesetofallreachableSS-inconsistent
transitionsusingtransitionsinδpF.Then,thefollowinghold:
1.Intheabsenceoffaults,everycomputationofpisacomputationofp
2.Intheabsenceoffaults,everycomputationofpisacomputationofp
3.Inthepresenceoffaults,pisfail-safefault-tolerant.

Weprovethefirstpartoftheclaim:
f.oPro1.FromDef.38,sscontainsonlySS-inconsistenttransitionsforp.

50

2.FromPropositions3,and4,onlySS-inconsistenttransitionsarere-
movedfromp.
3.Fromstep3,noSS-consistenttransitionforpisremovedinp.
4.Fromstep3,allSS-consistenttransitionsofparealsotransitionsofp
5.FromCorollary1andstep4,allreachabletransitionsofpinabsence
offaultsarereachablebypinabsenceoffaults.
6.Fromstep5,everycomputationofpisacomputationofpinabsence
faults.of

Theproofofthesecondpartoftheclaimfollows:
f.oPro1.FromDef.38,sscontainsonlySS-inconsistenttransitionsforp.
2.FromPropositions3,and4,onlySS-inconsistenttransitionsarere-
movedfromp.
3.Fromstep2,andbyconstruction,notransitionisaddedinp
4.Fromstep3,notransitionisaddedinpthatisreachableintheabsence
faultsof5.Fromstep4,δp⊆δp
6.Fromstep5,everycomputationofpisacomputationofpinabsence
faults.of

Theproofofthethirdclaimfollows:Weassumethatpisnotafail-safe
fault-tolerantprogram,andthenshowacontradiction.
f.oPro1.Assumepisnotafail-safefault-tolerantprogram.Thereexistsa
computationαofpsuchthatαviolatesSSinpresenceoffaults.

51

2.FromProp.2,thereexistsabadtransition(s,s)inα
3.Fromstep3,(s,s)isnotremovedinp.
4.Fromstep3,(s,s)isreachableusingtransitionsinδpFsince(s,s)is
SS-inconsistentforpinpresnceoffaults.
5.Contradiction,sincebyconstructionofp,allSS-inconsistenttransi-
tionsforpreachablebyusingtransitionsinδpFhavebeenremoved.
Thus,fromLemma3,p=p\ssrsolvesthetransformationproblem.
Also,removingssrfromδpcanbelikened,followingLemma2,tocomposing
thecriticalactionsofpwithperfectdetectors.
Wenowpresentaresultontheexistenceofperfectdetectors.

Lemma4(Existenceofperfectdetectors)Givenaprogrampwith
safetyspecificationSS,andfaultclassF.Foreverycriticalactionacin
p,thereexistsadetectorDsuchthatDisperfectforacinp.

f.oPro1.Fromassumption,actionacinpiscritical
2.From1,andDefinition43,thereexistsasetofSS-inconsistent
transitionsforpB={(s,t):(s,t)isSS-inconsistentforp∧
(s,t)inducedbyac}.
3.Letacrbethesetoftransitionsinducedbyacreachableinthepresence
faults.of4.Fromsteps3,and3,thesetO=acr\Bisthesetofalltransitionsin-
ducedbyacreachableinpresenceoffaultsthatwillnotcauseviolation
executed.whenSSof5.Fromstep4,setOdoesnotcontainanyreachabletransition(s,t)
inducedbyacthatisSS-inconsistent(bad)forp.

52

6.Fromstep4,setOcontainallreachabletransitions(s,t)inducedby
acthatareSS-consistentforp.
7.Fromsteps5,and6,thesetOS={s:(s,t)∈O}definesastate
predicate(thusadetector)thatisperfectforacinp.
Thus,wehaveshownthatforeverycriticalactionacofaprogram,there
existsaperfectdetectorforacinp.Atthispoint,sinceweforeverycritical
actionofaprogram,thereexistsaperfectdetector,thequestionis:howdo
these?designew

4.4.3ConstructingPerfectDetectors
Finally,westudyhowtoconstructperfectdetectorsforcriticalactions.This
willalsoprovideabasisforautomatedconstructionofsuchdetectors.

Theorem2(Constructingperfectdetectors)Givenafault-intolerant
programpwithsafetyspecificationSS,andfaultmodelF.Thefollowing
twostatementsareequivalent:

1.Theprogrampisobtainedbycomposingeachcriticalactionacofp
withaperfectdetectorforacinpinpresenceofF.
2.EachSS-inconsistenttransitioninducedbythecriticalactionacofp
reachableinthepresenceofFisunreachableinpinthepresenceof
F,andeachSS-consistenttransitionofpreachableinthepresenceof
FisalsoreachableinpinthepresenceofF.

f.oProWeassumeagivencriticalactionacofpbeingcomposedwithadetector
thatisperfectforacinpinpresenceofF,andshowtheimplicationofthe
secondstatementfromthefirstone.
1.aciscomposedwithadetectordthatisperfectforacinpinpresence
.Fof

53

2.Sincedisperfect,itcausesallSS-inconsistenttransitionsinducedby
actobeunreachableinpinpresenceofF.
3.Sincedisperfect,itcausesallSS-consistenttransitionsinducedbyac
tostillbereachableinpinpresenceofF.
4.Fromsteps2and3,wehavestatement2.
Theprooffortheimplicationofstatement1fromstatement2is
straightforward.Thefirstpartofstatement2(unreachabilityofSS-
inconsistenttransitionsinducedbyacriticalactionacinpresenceof
faultsF)impliescompletenessifadetectordmonitoringac,andthe
secondpartofstatement2(reachabilityofSS-consistenttransitions
inducedbyacriticalactionacinpresenceoffaultsF)impliesaccu-
racyofadetectordmonitoringac.Takentogether,disperfectforac
inpinpresenceofF.

Thealgorithmforsynthesizingperfectdetectors(orfail-safefault-tolerant
programswithperfectdetection)isbaseddirectlyonTheorem2.

4.5AnAlgorithmforPerfectDetectors
Inthissection,wepresentasoundandcompletealgorithmforsynthesizing
fail-safefault-tolerantprogramswithperfectdetection.Basedonthefact
thatcomposingcriticalactionsofafault-intolerantprogrampwithperfect
detectorsresultsinafail-safefault-tolerantprogrampwhosebehaviorinthe
absenceoffaultsisidenticaltothatofp.

Theorem3(Correctnesstheoftransformationalgorithm)Theal-
gorithminFigure4.5solvesthetransformationproblemofDefinition48.

Proof.Sincethealgorithmconstructspbyremovingthesetssrofall
SS-inconsistenttransitionsinducedbycriticalactionsofpreachablebyusing
transitionsinδpF,wecanapplyLemma3andTheorem2.

54

add-perfect-fail-safe(δp,δF,ss:setoftransitions):
{ssr:=get-ssr(δp,δF,ss)
return(p=pwheretransitionrelationisδp\ssr)}

get-ssr(δp,δF,ss:setoftransitions):
{ssr:={(s,t)|(s,t)isinducedbyacriticalactionofpand(s,t)is
SS-inconsistentforpinpresenceofF}

return(ssr)}

Figure4.5:Algorithmtosynthesizefail-safefault-tolerantprogramwithper-
n.detectiotfec

Theorem4(PerfectDetection)Givenafault-intolerantprogrampwith
safetyspecificationSSencodedasasetssofbadtransitions,andfaultclass
F.Programp:=add-efficient-fail-safe(p,F,ss)hasperfectdetection.

f.oPro1.Allτ∈ssraretransitionsinducedbycriticalactions(faultmodel)
2.Fromstep1,removingsetssrisequivalenttocomposingcriticalactions
ofpwithdetectors.
3.FromLemmas1,2,and3,,phasperfectdetectors.
4.Fromsteps3,and3,thedetectorsforthecriticalactionsofpare
ct.erfep

Wesaythatp’isfail-safefault-toleranttoF(orfail-safeF-tolerant),
andhasperfectdetectiontoF.

Theorem5(SoundnessandCompleteness)Algorithmadd-perfect-
fail-safeissoundandcomplete.

55

Soundnessmeansthattheresultingprogramsolvesthetransformation
problem,whilecompletenessmeansthatiftheresultofthecorresponding
decisionproblemistrue,i.e.,thefail-safefault-tolerantprogramexists,then
thealgorithmfindsit.
f.oProTheproofofsoundness(fromLemma3),andcompleteness(byconstruc-
tionandassumption)isstraightforward.

ComplexityofAlgorithmadd-perfect-fail-safe

Wenowprovideabriefanalysisofthecomplexityofthealgorithm:
1.Assumethatthenumberofbadtransitionsspecifiedbyssbem.
2.Assumethatthemaximumnumberoftransitionsvisitedtodetermine
reachabilityofabadtransitionisn.Then,thenumberoftransitions
visitedisO(n).
3.Therefore,maximumnumberoftransitionsvisitedwhencomputingset
ssrisO(m∙n).
4.RemovingsetssrhascomplexityO(m),sincethesizeofsetssris
).m(O5.Overall,thealgorithminFigure5.1hascomplexityO(m∙n+m)=
O(m∙n),wheremisthenumberofbadtransitionsspecifiedbyss,
andnisthemaximumnumberoftransitionsconsideredtoascertain
.ybilithareacThecomplexityofouralgorithmisnomorethanthecomplexityofan-
otheralgorithmpresentedbyKulkarni,andArora[KA00],whichalsohas
polynomialcomplexityinthestatespaceoftheprogram.Aninstanceof
algorithmadd-perfect-fail-safewasintroducedbyJhumkaetal.in[JHCS02]
thatgeneratesasetofperfectdetectors.Infact,Jhumkaetal.performed
afault-injectionexperimentonamedium-scaleembeddedsystemforanair-
craftarrestmentsystemtoascertaintheviabilityoftheconceptofperfect

56

detectorsin[JHS03].Themainfindingwasthatperfectdetectors(i)indeed
detecterrorsthatleadtoviolationofsafety,(ii)makenodetectionmistakes.
Inthenextsection,wepresentseveralcasestudiesshowingtheapplica-
bilityofourapproach.

4.6ThreeCaseStudies

Inthissection,wepresentseveralsimpleexamplestoshowhowthealgorithm
add-perfect-fail-safeworks.

4.6.1ASimpleExample

Figure4.6:Exampleprogrampinthepresenceoffaults

InFig.4.6,weshowaprogramp,togetherwiththefaulttransitions
thatcanaffectit.Forexample,transition(1,7)isafaulttransition.There
aretwobadtransitionsthatarespecifiedbythesafetyspecificationSSof
theprogram,namelytransitions(10,11),and(20,21).FromLemma3,and
algorithmadd-perfect-fail-safe,weneedtoremoveonlythesetssofbadtran-
sitions,asspecifiedbythesafetyspecificationSSoftheprogram,tomakeit
fail-safefault-tolerant.Thus,add-perfect-fail-safe(δp,δf,{(10,11),(20,21)})
willresultintheremovalofallthetransitionsinssfromtheprogram.
ThisisdepictedinFig.4.7.Observethattherecannolongerexista
computationofpinpresenceoffaultsthatwillincludeabadtransition.
Thus,theresultingprogrampisfail-safefault-tolerant,andprogramp
solvesthetransformationproblem.

57

Figure4.7:Fail-safefault-tolerantprogrampobtainedbyremovingss

4.6.2AMajorityVoterSystem
Werecallhowatriplemodularredundant(TMR)majorityvotersystem
works.Thesystemconsistsofthreeinputsin.1,in.2andin.3fromthree
processesp1,p2,andp3respectively,andanoutputvariable,calledout.For
simplicity,weconsiderthecasewhereeachprocesspiinputsabinaryvalue
in.itothevoter.Intheabsenceoffaults,allthreevaluesareidentical.
Forthemajorityvotersystem,theclassoffaultsthatweconsiderisone
thatcancorrupttheinputvalueofatmostoneofthethreeprocesses.Inthe
absenceoffaults,allinputsareidenticalandthevalueoftheoutputvariable
outcanbesettothatofanyin.i,foranyprocesspi.
Thus,thefault-intolerantmajorityvoterprogramcanbewrittenasfol-
s:wlo

ITMR1::out=?→out:=in.1

Variableout=?meansthattheoutputhasnotyetbeenset.Inthe
absenceoffaults,thevalueoftheoutputvariableissettothatofin.1.
ThefaultstransitionsFthatwecanconsiderarethosetransitionsthat
corrupttheinputvaluein.ifromatmostoneprocesspi,settingittosome
arbitraryvalue.Thus,inourexample,thefaulttransitionsconsideredcan
berepresentedassuch:

F::(in.1=in.2)∨(in.1=in.3)→in.1=⊥

58

InpresenceoffaultsFthatcancorrupttheinputvaluefromprocessp1,
i.e.,in.1,theoutvariablecanbewronglyset,i.e.,itobtainsitsvaluefroma
corruptedvalueofin.1.Thespecificationofamajorityvoteristhatitalways
outputsthemajorityofitsinputs,anditssafetyspecificationissuchthatit
neveroutputsavaluethatisnotthemajority,i.e.,afail-safemajorityvoter
willneveroutputacorruptedvalue(underourassumedfaultmodel),though
itmaydeadlockinpresenceoffaults.However,asweexplainedbefore,a
fail-safefault-tolerantprogramneedstosatisfyonlyitssafetyspecification
inpresenceoffaults.
Thus,everytransitionthatsetstheoutputvariableouttoacorrupted
valueshouldberemoved.Specifically,considerset
T={t:((t(out)=t(in.1))∧((t(in.1)=t(in.2))∧(t(in.1)=t(in.3)))},
wheres(v)isthevalueofvariablevinstates,andsetTrepresentstheset
ofstateswherevariableoutisincorrectlyset,i.e.,variableoutisnotsetto
themajorityvalue.Hence,anytransition(s,t):t∈Tisabadtransitionfor
theTMRprogram,ITMR1,i.e.,ss={(s,t):t∈T}.
Thus,runningalgorithmadd-perfect-fail-saferesultsinremovingsetss
fromprogramTMR.RemovingsetssfromTMRmeansthatalltheremain-
ingtransitionsthatsettheoutputvariableoutsetouttoamajorityvalue,
wheneveroutisnotalreadyset.Inotherwords,alltheotherremaining
transitionsthatsetouttoin.1,theystartfromastatewhereoutisdifferent
fromin.1,andwherein.1isequaltoatleastoneoftheotherinputvariables,
i.e.,in.1=in.2orin.1=in.3.Thus,indetectorterms,weneedtocheckthat
in.1isequaltotheinputvalueofatleastoneoftheotherprocesses.Hence,
thefail-safefault-tolerantprogrammajorityvoter,FSTMR1,is:

FSTMR1::(out=in.1)∧((in.1=in.2)∨(in.1=in.3))→out:=in.1

Theorem6(Fail-safeTMR)ProgramFSTMR1isfail-safefault-tolerant
withperfectdetectiontofaultsthatcorrupttheinputofatmostoneprocess.
ObservethatiffaultscanarbitrarilycorrupttheoutputoutoftheTMR
system,thennofail-safefault-tolerantTMRexists,henceourfocusontol-
dels.mofaulterable

59

4.6.3TokenRing
Inthisexample,wepresentanexampleofafail-safefault-tolerantversionof
thetokenring.Wefirstrecallthemutualexclusionalgorithmusingatoken
.ringMultipleprocesseswaittoaccesstheircriticalsection.Theycandoso
providedthatatanyonetime,atmostoneprocessisaccessingitscritical
section.Thisisthesafetyspecificationforamutualexclusionalgorithm.
Also,noprocesswaitsindefinitelytoaccessitscriticalsection,assumingthat
eachprocessleavesitscriticalsectioninfinite“time”.Thisistheliveness
specificationofamutualexclusionprotocol.
Weassumeacollectionofprocessesarrangedinaring.Mutualexclu-
sioncanbeachievedinsuchascenariobycirculatingatokenamongthese
processes,andaprocessaccessesitscriticalsectiononlyuponreceiptofthe
token.Thetokeniscirculatedamongtheprocessesinaparticulardirection.
Forthetokenring,thesafetyspecificationisthatatmostoneprocessholds
thetokenatanyonetime.Inthisexample,wepresentafail-safefault-
tolerantversionofatokenring,i.e.,inthepresenceoffaults,atmostone
processholdsthetoken.
Processes0...Narearrangedinaring.Processk,0≤k<Npasses
thetokentoprocessk+1,whereasprocessNpassesthetokentoprocess0.
Eachprocesskhasabinaryvariable,t.k,andaprocessk,k=Nholdsthe
tokenifft.k=t.(k+1),andprocessNholdsthetokenifft.N=t.0.
Thefault-intolerantprogramforthetokenringisasfollows(+2ismodulo-
:ion)addit2

ITR1::k=0∧t.k=t.(k−1)→t.k:=t.(k−1)
ITR2::k=0∧t.k=t.N+21→t.k:=t.N+21

Faultaction:Thefaultactionthatweconsideris

F::true→t.k:=⊥

60

Note:Ingeneral,weassumefaultsliketimingfaults,messageloss,or
duplicationetc.However,weassumethatwhensuchfaultsoccur,aprocess
ksetsitsvariablet.k=⊥.Inthissense,representingthefaultsasFabove
isrepresentativeofalargeclassoffaults.Moreover,thesefaultsarethen
detectable.Also,therecanbeanynumberofstatecorruptions.
Thesafetyspecificationofthetokenringisthatatanytimenomorethan
oneprocessholdsthetoken.Inpresenceoffaults,especiallywhent.k=⊥,
noactionbasedont.kshouldbetaken,justincaseprocessk+1receivesa
duplicatetokeninadvertedly.Hence,alltransitionsofprocesskthatoccur
whenthestateofprocess(k−1),t.(k−1)=⊥willleadtosafetyviolation,
andshouldberemoved.Thus,weonlyallowprocessktoexecuteitsaction
whent.(k−1)=⊥.
Thus,applyingadd-perfect-fail-safetothefault-intoleranttokenring
ITR,theresultingfail-safefault-toleranttokenringFSTRis:

FSTR1::t.(k−1)=⊥∧k=0∧t.k=t.(k−1)→t.k:=t.(k−1)
FSTR2::t.N=⊥∧k=0∧t.k=t.N+21→t.k:=t.N+21

Theorem7(Fail-safeTR)ProgramFSTRisfail-safefault-tolerantto
faultsthatcorruptthestateofanyprocess.

4.7ChapterSummary

Inthischapter,wehavepresentedanoveltheoryofdetectorcomponentsfor
thedesignoffail-safefault-tolerantprograms.Thistheoryallowstoderive
atransformationalgorithmwhichautomaticallyaddsfault-toleranceability-
withperfectdetection.Specifically,wehavemadethreeimportantcontri-
butionsinthischapter:(i)Wehavepresentedanoveltheoryofdetectors,
and(ii)identifiedaclassofdetectorscalledperfectdetectors,andthenex-
plainedtheirrole,andimportanceinfail-safefault-tolerantprograms,(iii)we
haveprovidedanalgorithmthataddsperfectdetectorstoafault-intolerant
programtosynthesizeafail-safefault-tolerantprogram.

61

Themotivationforperfecterrordetectionisobviousinadaptivesys-
tems.Inadaptivesystems,usually(inperiodsofnon-perturbation)afault-
intolerantprogrampexecutes.Duringperiodsofperturbation,afault-
tolerantversionpofp(withpossiblylowerefficiency)isswitchedin.If
adetectorisnotaccurate,thenpmaybeswitchedin,evenwhenthereis
noperturbation,loweringtheefficiencyofthesystem.Ifthedetectorisnot
complete,itmightfailtodetectanerrorentirely.Hence,perfectdetection
isnecessaryifthesystemistobecorrectandefficient.
Weclosethischapterwithafinalremarkconcerninganobservationmade
byAroraandKulkarni[AK98b,Sect.3].Theauthorsobservedthat“based
ontheirexperience”,whendesigningfail-safefault-tolerantprograms,the
detectorsfornon-criticalactionsaretrivial,i.e.,“true”,whereasthedetectors
forcriticalactionsarenon-trivial.Lemma2isthefirstformaljustification
ofthevalidityofAroraandKulkarni’sobservationsinceitshowsthatitis
sufficienttocomposecriticalactionswithperfectdetectorstoensurefail-safe
fault-tolerance.Provingthisstatementwasmadepossiblebyournotionsof
accuracyandcompletenessofdetectors.Inthissensethesepropertiescan
beregardedasaconcretizationof“non-trivial”.

62

5pterCha

FFastastErrDetectorDeors:tectiAonBasisfor

Inthepreviouschapter4,weintroducedtheconceptofperfectdetection
(perfectdetectors),i.e.,theabilityofadetectortodetectallharmfulfaults,
andnotmakinganymistakes.Thisrepresentsoneaspectofefficiencyof
fail-safefault-tolerantprograms.
Inthischapter,welookatasecondaspectofefficiencyoffail-safefault-
tolerantprograms,namelyfasterrordetection.Weintroducetheconcept
offastdetection,andisbasedupontheconceptofSS-consistencyandSS-
inconsistency,asdefinedinChapter4.Further,whileprovidingforfasterror
detection,weendeavortopreservetheabilityforperfecterrordetection.In
thischapter,wemakethefollowingcontributions:
1.Wepresentatheoryoffastandperfectdetection,andformalizea
metric,calleddetectionlatency,thatcanbeusedto(i)estimatethe
detectionlatencyofafail-safefault-tolerantprogram,and(ii)com-
parethedetectionlatencyefficiencyofdifferentfail-safefault-tolerant
.msgrapro

2.Wepresentasound,andcompletealgorithmthat,givenafault-
intolerantprogrampwithsafetyspecificationSSandfaultmodelF,
synthesizesafail-safefault-tolerantprogramwithperfectdetection,
andminimallatency.Aswayofcontrast,thealgorithmofchapter4
synthesizesfail-safefault-tolerantprogramswithperfectdetectiononly.

63

5.1Introduction

Giventhepervasivenessofcurrentcomputersystems,theirabilitytotolerate
effectoffaultsisbecomingincreasinglyimportant,asshowninChapter4.
Whensuch(harmful)faultsoccur,theycancorruptthestateoftheprogram.
Whenavariableiscorrupted,anditscorruptedvalueisusedtoupdatethe
valueofanothervariable,errorissaidtopropagate.Ifnoimmediateaction
istaken,theerrormaypropagatebeyondagivenboundary.Whentheerror
propagatesbeyondthe“system”boundary,afailureissaidtooccur.Thus,as
weshowedinChapter4,composingcriticalactionswithperfectdetectorswill
preventtheerrorsfrompropagatingbeyondthe“system”boundary.Ifthe
system/programtobedesignedisonlyneededtobefail-safefault-tolerant,
oneonlyneedstocomposecriticalactionswithperfectdetectors.
However,aswementionedinChapter1,whendesigningamaskingfault-
tolerantprogram,designingacorrespondingfail-safefault-tolerantprogram
canbethefirststepoftheprocess.Hence,asweexplainedinChapter1
onthedesignoffaulttolerance,onceanerrorisdetected,itneedstobe
correctedtoo.However,thegreatertheerrorpropagation,thegreateristhe
recoveryprocess.Toseethis,considerthefollowing:assumethatinagiven
programp,thevalueofavariablev1isusedtoupdatethevalueofanother
variablev2.Now,afaultoccursthatcorruptthevalueofv1,andtheresulting
errorisdetected,beforev2isupdated.Duringrecovery,onlythevalueof
v1needstobe“recovered”.However,ifv2isupdatedwiththecorrupted
valueofv1,thenduringrecovery,boththevalueofv1,andv2needstobe
recovered.Hence,therecoveryprocessismorecomplicated.Thus,fromthe
pointofviewofrecovery,theearliertheerrorisdetected,thesimplerthe
errorrecoveryprocess.Thus,inthischapter,wepresentatheoryoffast
.ctiondeteorerrThetheoryoffastdetectionisalsobasedonthenotionofSS-consistency
andSS-inconsistencyoftransitions,developedinChapter4.Recallthata
givendetectormonitorsacertainprogramaction.Ifthestateofaprogramis
suchthatsafetycanpotentiallybeviolatedbyexecutionofprogramactions
alone(seedefinitionofSS-inconsistenttransitions–Def37–Chapter4),then

64

theseprogramactionsneedtobepreventedfromoccurring.Specifically,the
programcanbehaltedevenbeforesafetyisabouttobeviolated,through
executionofbadtransitions.Thishastheeffectofpreventingerrorsfrom
propagating,andcorruptingthewholestatespaceoftheprogram,i.e.,the
effectofthefaultiscontained.
Intuitively,toachievefastdetection,notonlydothecriticalprogram
actionsneedtobemonitoredwithperfectdetectors,butothernon-critical
programactionstoo(dependingonthefaultmodel).Whatthismeansis
thatwemayneedtorefinetheguardsofbothcriticalandnon-criticalactions,
dependingonthefaultmodel.Aswayofcontrast,thetheorypresentedin
Chapter4refinesonlytheguardsofcriticalactions.
Toevaluatethe“fastness”atwhichafail-safefault-tolerantprogramde-
tectsanerror,weformalizeacommonly-usedmetriccalleddetectionlatency.
Foraprogramthatisfail-safefault-tolerant,fromChapter4,onlybadtran-
sitionsinducedbycriticalactionsareremoved.Thus,fromtheonsetofa
“harmful”faulttothe“time”theprogram“halts”atacriticalaction(i.e.,
thetransitioninducedbythecriticalactionisremoved,sinceitisbad),the
detectionlatencyforthisfaultis“maximal”.Wheneverwesaythatdetectors
arefast,wemeanthatthe“time”(morespecifically,thenumberofprogram
transitions)ittakesfortheprogramto“halt”islessthanthatmaximum
detectionlatency.Inthischapter,weshowhowthedetectionlatencycan
beminimized,i.e.,thefaultisdetectedin0-step.Further,aswecomposed
bothnon-criticalandcriticalactionswithdetectors,weendeavortodevelop
perfectdetectorsinsuchcases,thuspreservingtheabilityofhavingperfect
detection.Thischapterisstructuredasfollows:InSection5.2,wepresentatheory
offasterrordetection.Wedefinethetransformationproblemforfast,and
perfecterrordetectioninSection5.3.InSection5.4,wepresentanalgorithm
thatsolvesthetransformationproblem.Wepresentexamplestoillustratethe
workingofthealgorithminSection5.5.Wediscusssomeissuesconcerning
theapproachinSection5.6,andwesummarizeandconcludethechapterin
5.7Section

65

5.2FastErrorDetection
Perfectdetectors,introducedinChapter4,ensurecorrectnessofthefail-safe
faulttolerancetransformationproblem.Theyensurethat,intheabsence
offaults,livenessoftheresultingprogramisnotcompromised,whilealso
ensuringthatsafetyisnotviolatedinpresenceoffaults.
Wenowturntoadifferentaspect,namelythedetectionlatencyefficiency
offail-safefault-tolerantprograms.Intuitively,wewouldlikeanerrortobe
detectedasearlyaspossibletopreventfurthercontaminationoftheprogram
state.Ifafaultoccurs,andnoprecautionistaken,thentheerrorcanprop-
agate,andcorrupttheentirestatespaceoftheprogram.Moresophisticated
recoverymethods,suchasadistributedreset[AG94],maythenbeneededto
getthesystemintoaconsistentstate,whicharecomputationallyexpensive
cedures.proInthissection,wefocusonexplainingtherelationshipbetweenfastde-
tectionandSS-inconsistenttransitions.Toseefastdetection,considera
computationα=s0...si−1∙si∙si+1...ofafault-intolerantprogrampin
thepresenceoffaultsthatviolatessafety.Assume(si,si+1)isabadtransi-
tion(fromProposition2)inα.Fromthealgorithmadd-perfect-fail-safeof
Chapter4,transition(si,si+1)wouldberemovedfrompwhensynthesizing
thefail-safefault-tolerantprogramp,sothattransition(si,si+1)isunreach-
ableinα.However,iftransition(si−1,si)isaprogramtransition,andis
thusSS-inconsistentforp,thenremovingtransition(si−1,si)willalsomake
transition(si,si+1)unreachableinα.So,fromthepointofviewofsafety,
removingtransition(si−1,si),or(si,si+1)achievesthesameresult,thatof
makingthebadtransition(si,si+1)unreachable.However,notexecuting
transition(si−1,si)preventserrorfrompropagating,i.e.,thevariablesthat
wouldhavebeenupdatedandcorruptedhadthetransition(si−1,si)taken
placearenownotcorrupted,henceerrorsarecontained.
Informally,adetectordmonitoringanactionacinpisperfectforacinpif
itremoveseveryarbitrarySS-inconsistenttransitioninducedbyacforevery
violatingexecution,whilekeepingallSS-consistenttransitionsinducedby
ac.Thus,givenacomputationαofpthatviolatessafety,αhasasequenceof

66

programtransitionsleadingtoabadtransition(followingfromthedefinition
ofSS-inconsistencyofChapter4),whereeverysuchprogramtransitionin
thatsequenceisSS-inconsistentforp.Forfastdetection,i.e.,toprevent
errorfrompropagating,agivenprogramactionacofpshouldbecomposed
withaperfectdetectorsuchthatthe“first”SS-inconsistenttransitionofthat
sequenceisremoved,andthattransitionisinducedbyac.Thiswillallowa
fail-safefault-tolerantprogramtohavebothperfectandfasterrordetection.
Onthisbackground,weformalizethenotionofanearliestSS-inconsistent
nsition.tra

Definition44(EarliestSS-inconsistenttransition)GivenanF-
intolerantprogrampwithsafetyspecificationSS,andacomputation
α=s0∙s1∙∙∙si∙si+1∙∙∙smofpinthepresenceoffaultsthatviolates
SS.Thetransition(si,si+1)istheearliestSS-inconsistenttransitionforp
w.r.t.αiffthefollowingtwopropertieshold:
1.(si,si+1)isSS-inconsistentforpw.r.t.α.
2.(si−1,si)isatransitioninducedbyafaultaction.

Intuitively,whenacomputationαofaprogrampinthepresenceoffaults
violatesthesafetyspecificationSSofp,thereexistsasuffixoftheviolating
computationprefixofαthatstartswithanSS-inconsistenttransitionand
endsinabadtransition.TheearliestSS-inconsistenttransitionisthefirst
SS-inconsistenttransitioninthissuffix.Basically,itisthefirstprogram
transitionthatleadstheprogramonthewrongtrack.Sincewehaveno
controlonfaulttransitions,thisisthefirsttransitionwhichcanbeenabled
ordisabled,dependingonwhetheritisSS-consistent,orSS-inconsistent.
DefinethesetEITpF(SS)ofearliestSS-inconsistenttransitionsofapro-
grampastheunionoftheearliestSS-inconsistenttransitionsoverallcom-
putationsofpviolatingSS.Definep\EITpF(SS)astheprogrampwhichis
thesameaspexceptthatalltransitionsfromEITpF(SS)havebeenremoved
.δfromp

67

Definition45(Fastdetectors)Letpbeafault-intolerantprogram.Aset
ofperfectdetectorsDforprogrampisfastiffpcomposedwithDresultsin
p\EITpF(SS).

Atthispoint,weneedtoassesstheroleandimpactoffast(perfect)
detectorsinthepresenceoffaults.Wefindthat,composingafault-intolerant
programwiththesefastdetectors,andalsosincethesefastdetectorsare
perfect,theresultingprogramisindeedfail-safefault-tolerant,andthisis
capturedbyLemma5.

Lemma5(Fastperfectdetectorsandbehaviorinthepresenceoffaults)
Letpbeafault-intolerantprogramforasafetyspecificationSS.Thenp
composedwithasetoffastperfectdetectorsforpsatisfiesSSinthepresence
faults.of

Proof.ThisisageneralizationoftheproofofLemma2.

1.Foracontradiction,itisagainassumedthatpviolatesSS,i.e.,that
thereexistsacomputationσofpwhichisnotinSS.
2.FromDefinition44itispossibletogeneralizeProposition3tostate
thatineveryviolatingexecutionthereexistsanearliestSS-inconsistent
transitionineveryviolatingcomputation.Denotethistransitioninσ
as(s,s).
3.ThefactthatdetectorsarefastandfromDefinintion45wehavethatall
earliestinconsistenttransitionsareremovedfrompwhileconstructing
p,whichisacontradictiontotheoccurrenceof(s,s)inacomputation
.pof

Wenowdefine,andformalizeametrictomeasurethe“fastness”ofdetec-
tors.Intuitively,thedetectionlatencymetricdefinesthenumberofprogram
transitionsexecuteduntiltheprogram“halts”atadetector,aftera“harm-
ccurred.ohasfaultful”

68

Definition46(Detectionlatency)LetSSbeasafetyspecificationandp
beaprogramwhichhasbeenmadefail-safefault-tolerantforSSbycompos-
ingafault-intolerantprogrampwithasetofdetectors.Considerafinite
computationα=s0∙∙∙si−1∙si∙si+1∙∙∙smofpinthepresenceoffaults,such
that:1.(si−1,si)isatransitioninducedbyafaultaction,
2.alltransitionsinsi...smareinδp,and
3.startingfromsmabadtransitioninSSisreachablebyusingasequence
ofprogramactionsofp.
Then,thedetectionlatencyLp(α)ofpw.r.t.αisthenumberoftransitions
executedinsi...sm,i.e.,(m−i)transitions.

Intuitively,thedetectionlatencymeasuresthenumberof(SS-
inconsistent)transitionsexecutedaftertheoccurrenceofaharmfulfault,
andbeforeadetectorhaltstheprogram.

Definition47(Maximumdetectionlatency)LetFbeafaultmodel,
SSbeasafetyspecificationandpbeafail-safeF-tolerantprogramforSS.
ThemaximumdetectionlatencyLMpofpisdefinedasthemaximumof
Lp(α)forallcomputationsαofpinthepresenceoffaults.

Lemma6(Latencyoffastdetectors)Givenafault-tolerantprogramp
whichistheresultofthecompositionofafault-intolerantprogrampwitha
setoffastperfectdetectors.Thenphasmaximumdetectionlatency0.

Proof.Consideranycomputationα=s0∙∙∙si∙∙∙smofpwhichsatisfies
Definition46.Weneedtoshowthatsi=sm.
1.Definition46impliesthatthereexistsacomputationσofpwhichcan
bewrittenasσ=α∙β(i.e.,acontinuationofα)whichviolatesSS.
2.Step1andDefinition44implythat(si,si+1)istheearliestSS-
inconsistenttransitionofpw.r.t.σ.

69

3.Step2andthedefinitionoffastdetectorsimplythatpevolvedfromp
byremoving(amongothertransitions)also(si,si+1).
4.Step3impliesthatsi=sm,whichineffectmeansthatLp(α)=0.
Sincewehavenotrestrictedthechoiceofα,thestatementholdsforallα.
ThisimpliesthatLMp=0.
Sincethemaximumdetectionlatencyofafail-safefault-tolerantprogram
pmustbeatleast0,compositionofafault-intolerantprogrampwithfast
perfectdetectorsresultsinafail-safefault-tolerantprogrampwithoptimal
detectionlatency.Itremainstobeshownthatthiscompositionpreserves
theoriginalbehaviorofthefault-intolerantprogramintheabsenceoffaults.

Lemma7(Fastperfectdetectorsandfault-freebehavior)Givena
programpthatisthecompositionofafault-intolerantprogrampandaset
offastperfectdetectors.Forpandpholds:

1.Intheabsenceoffaultseverycomputationofpisacomputationofp.
2.Intheabsenceoffaultseverycomputationofpisacomputationofp.

Proof.TheproofisthesameasthatofLemma1.
Lemmas5(behaviorinthepresenceoffaults),6(optimaldetectionla-
tency)and7(behaviorintheabsenceoffaults)takentogethershowthat
composingafault-intolerantprogramwithfast,perfectdetectors,there-
sultingprogram(i)preservestheoriginalbehaviorintheabsenceoffaults,
(ii)isfail-safefault-tolerantinthepresenceoffaults,and(iii)hasminimal
detectionlatency.Theselemmaswillformthebasisforderivingthetrans-
formationalgorithmforaddingfast,perfectdetectorsinSection5.4.
Inthenextsection,basedontheresultsdevelopedinChapter4andinthis
section,weprovideanalgorithmthatautomatesthesynthesisofafail-safe
fault-tolerantalgorithmwithperfectandfasterrordetectioncapabilities.

70

5.3TheTransformationProblemforFast
andPerfectDetection
Wenowformallystatetheproblemoftransformingafault-intolerantprogram
pintoafail-safefault-tolerantversionpforagivensafetyspecificationSS
andfaultmodelFwithperfectdetection,andminimaldetectionlatency.
Again,whenderivingpfromp,onlyfaulttoleranceshouldbeadded,i.e.,
pshouldnotsatisfySSinnewwaysintheabsenceoffaults.Forcomplete-
ness,werecallthemainconstraintsdefiningthetransformationproblem:

•Ifthereexistsatransition(s,t)inpthatisnotreachedbyptosatisfy
SS,then(s,t)cannotbeusedbyp,sincethismeansthatthereare
otherwayspcansatisfySSintheabsenceoffaults.Thus,thesetof
transitionsofpshouldbeasubsetofthesetoftransitionsofp.
•Also,ifthereexistsastatesreachablebypintheabsenceoffaults
thatisnotreachedbypintheabsenceoffaults,thenthismeansthat
pcansatisfySSdifferentlyfrompintheabsenceoffaults,andsuch
astatesshouldnotbereachedbypintheabsenceoffaults.Thus,
thesetofstatesreachablebypshouldbeasubsetofthesetofstates
reachablebyp.

Ingeneral,theseconditionsresultintherequirementthatbothprograms
shouldhavethesamesetoffault-freecomputations.Formally,wedefinethe
transformationproblemasfollows:

Definition48(Transformationforefficientfail-safefaulttolerance)
LetSSbeasafetyspecification,afaultmodelF,andpanF-intolerant
programforSS.Identifyaprogrampsuchthatthefollowingfourconditions
hold:1.psatisfiesSSinpresenceofF.
2.Intheabsenceoffaults,everycomputationofpisacomputationofp.
3.Intheabsenceoffaults,everycomputationofpisacomputationofp.
4.phasdetectionlatency0.

71

Thetransformationproblemcanalsobeformulatedasadecisionproblem:

Definition49(Correspondingdecisionproblem)LetSSbeasafety
specification,afaultmodelF,andpanF-intolerantprogramforSS.Does
thereexistaprogrampsuchthatthefollowingthreeconditionshold:

1.psatisfiesSSinpresenceofF.
2.Intheabsenceoffaults,everycomputationofpisacomputationofp.
3.Intheabsenceoffaults,everycomputationofpisacomputationofp.
4.phasdetectionlatency0.

LaterinSection5.4wepresentasound,andcompletealgorithmwhich
solvestheabovetransformationproblem,i.e.,wepresentanalgorithmthat
systematicallytransformsafault-intolerantprogramintoaprogramthatsat-
isfiestheabovethreeconditions.Soundnessofthealgorithmmeansthatthe
resultingprogramindeedsolvesthetransformationproblem.Completeness
ofthealgorithmmeansthatifthesolutiontothedecisionproblemistrue,
thenthealgorithmwillfindthefail-safefault-tolerantprogram.
Thealgorithmisbasedonatheoryforperfectdetectorswhichwein-
troduceinthefollowingsection.Thealgorithmalsosynthesizesfail-safe
fault-tolerantprogramthatdetectsfaultsearly,andisbasedonatheoryfor
ors.tdetecfast

5.4AddingEfficientFail-SafeFaultToler-
ance

Inthissectionwegiveanalgorithmtosolvethetransformationproblemof
Definition48whichfollowsfromthetheorypresentedinSection4.4.
Thebasicideaofthealgorithmistoremovethesetofearliestinconsistent
transitionsfromtheinputprogramp.Intuitively,thealgorithmworksas
follows:Ittakesasparametersthefault-intolerantprogramp(intheform
ofitstransitionrelationδp)andthefaultmodelF(intheformofthesetof
faulttransitions).ThesafetyspecificationSSisencodedasthesetofbad
transitionsandpassedtothealgorithminvariabless.

72

Startingfromthesetofbadtransitionsinss,thealgorithmconstructsthe
setitofallinconsistenttransitions.Fromthisset,itconstructstheseteit
ofearliestinconsistenttransitions.Thissetoftransitionsisremovedfromδp
yieldingthetransitionrelationofthetransformedprogram.Thealgorithm
ispresentedinFigure5.1.

add-efficient-fail-safe(δp,δF,ss:setoftransitions):
eit:=get-eit(δp,δF,ss)
return(p=pwheretransitionrelationisδp\eit)

get-eit(δp,δF,ss:setoftransitions):
it:={(s0,s1)|∃α=s0∙s1∙s2∙∙∙ofprogramtransitions:
∃(s,s)∈ss:(s,s)occursinαand(s,s)isreachable
F}δinpeit:={(s0,s1)|(s0,s1)∈it∧∃s∈Sp:(s,s0)∈δF}
)eit(return

Figure5.1:Algorithmtoaddefficientfail-safefault-tolerance.

Theorem8(Correctnessofthetransformationalgorithm)Theal-
gorithminFigure5.1solvesthetransformationproblemofDefinition48.
Furthermore,theresultingprogramhasminimaldetectionlatency.

Proof.Sincethealgorithmconstructspbyremovingthesetofallear-
liestinconsistenttransitions,wecanapplythelemmasfromSection5.2.
Lemma5ensuresthatpsatisfiesthespecificationinthepresenceoffaults,
whichprovesthefirstrequirementofDefinition48.Lemma6ensuresthat
themaximumdetectionlatencyis0,meaningthatitistriviallyoptimal.
Lemma7ensuresthatpandphavethesamefault-freebehaviorwhich
provesthesecondandthirdrequirementsofDefinition48.

Theorem9(SoundnessandCompleteness)Algorithmadd-efficient-
fail-safeissoundandcomplete.

73

f.oProTheproofofsoundness(fromLemma5),andcompleteness(byconstruc-
tion)isstraightforward.
Incontrast,anotheralgorithmforautomaticsynthesisoffail-safefault-
toleranceproposedbyKulkarniandArora[KA00]generatesprogramswith
detectionlatencyequaltothemaximumlengthoverallpartialcomputations
consideredwhencomputingsetit,sincetheyremoveonlybadtransitions,
i.e.,thelasttransitioninthepartialexecution,whereasweremovethefirst
one.Wenowprovideabriefanalysisofthecomplexityofouralgorithm:
1.Assumethatthenumberofbadtransitionsspecifiedbyssism.
2.Letthemaximumnumberofcomputationscontaininganybadtransi-
.cisntio3.Thus,tocomputesetit,thenumberofpartialcomputationsvisitedis
O(m∙c).
4.Assumethatthemaximumnumberoftransitionsvisitedinanypartial
computationwhencomputingsetitisn.
5.Themaximumnumberoftransitionsvisitedwhencomputingsetitis
O(m∙c∙n).
6.Computingseteitmeansgoingthroughsetit,thusthisstephascom-
plexityO(m∙c∙n).
7.RemovingseteithascomplexityO(m∙c),sinceseteithassizeO(m∙c).
8.Overall,thealgorithminFigure5.1hascomplexityO(m∙c∙n+m∙c∙n+
m∙c∙n+m∙c)=O(m∙c∙n),wheremisthenumberofbadtransitions
specifiedbyss,cisthenumberofmaximumnumberofcomputations
containinganygivenbadtransition,andnisthemaximumnumberof
transitionsconsideredinanypartialcomputation.
Also,asmentionedintheIntroduction(Chapter1),ourapproachtargets
aclassofprogramsknownasboundedprograms.Inboundedprograms,the
lengthofthepartialexecutionstobeconsideredwhencalculatingthesetit
isfinite.Thismeansthatintheprogram,therearenoinfiniteorunbounbed

74

loops,ratherallloopsarebounded.Aninstanceofboundedprogramscan
usuallybefoundinthedomainofembeddedapplications,morespecifically
applicationswheretheoutputistobewrittenwithinaboundednumberof
steps.Anotherinstanceofboundedprogramsaredistributedalgorithms.
Thealgorithmadd-efficient-fail-safewasalsopresentedin[JHCS02]to
generateSS-globallyconsistentdetectors.Thealgorithmbasicallyremoved
allearliestSS-inconsistenttransitionssuchthattheresultingprogramhave
bothperfectdetection,andminimaldetectionlatency.
Wenextsomeexamplestoshowtheworkingofouralgorithm.

5.5TwoCaseStudies

Inthissection,wepresenttwoexamples.Forthefirstexample,wereusethe
fault-intolerantprogramofthefirstexampleofChapter4,andthesecond
exampleconcernsamajorityvoter.

5.5.1ASimpleExample
InFig.5.2,state1isaninitialstate,andintheabsenceoffaults,execution
goesfromstates1...6.However,inthepresenceoffaults,otherstates
previouslyunreachablebecomereachable,forexample,state7.Transition
(10,11)isabadtransition,specifiedbythesafetyspecification.Transition
(7,8)isanSS-inconsistenttransition.

Figure5.2:Anexampletoillustratehowalgorithmadd-efficient-fail-safe
ksorw

75

Acalltoadd-efficient-fail-safe(p,F,ss)willpasstheprogramofFig.5.2
asargument,withthesetoffaulttransitions.Thevariablessholdstheset
ofbadtransitionsspecifiedbythesafetyspecificationoftheprogram,andis
equalto{(10,11),(20,21)}.Thus,
1.Theprogramp={(1,2),(2,3)...(7,8),(8,9)...}
2.faultF={(1,7),(1,17),(3,9),(3,14)...}
3.ss={(10,11),(20,21)}
Fromthealgorithm,wecollectallearliestinconsistenttransitions.We
startwitheachbadtransitionspecifiedbythesafetyspecification.Forexam-
ple,transition(10,11)∈ss.We“backtrack”alongallpossiblecomputations
inpresenceoffaultsthatincludetransition(10,11),untilafaulttransition
occurs.Theprogramtransitionthatfollowsthefaulttransitionisanearliest
inconsistenttransition.Forexample,goingbackwards,startingfromtransi-
tion(10,11),wereachreachfaulttransition(3,9),whichmakestransition
(9,10)anearliestinconsistenttransition.Similarly,transition(7,8)isan
earliestinconsistenttransition.Hence,eit={(9,10),(7,8)}.
Likewise,startingwithtransition(20,21)∈ss,wehavethefollowing
earliestinconsistenttransitions:{(20,21),(20,19),(19,18),(18,17)}.Hence,
eit={(9,10),(7,8),(20,21),(20,19),(19,18),(18,17)},andweneedtore-
moveeitfromtheprogram.
TheresultingprogramisshowninFig.5.3.Observethatthetransitions
inssarenowunreachableinthepresenceoffaults,whichmakesthepro-
gramfail-safefault-tolerant.Also,assoonasa“harmful”erroroccurs(i.e.,
thosethatcouldhavebroughtaboutviolationofsafety),theSS-inconsistent
transitionisdisabled/removed.
Inthenextexample,wepresentthecaseofthemajorityvoter,developed
inthelastchapter.
5.5.2AMajorityVoterSystem
Torecalltheworkingofthetriplemodularredundantmajorityvoter,there
arethreeprocessesp1,p2,andp3thatinputsvaluesin.1,in.2,andin.3
76

Figure5.3:Fail-safefault-tolerantprogramresultingfromapplyingalgorithm
add-efficient-fail-safe

respectively,andanoutputvariable,calledout.Forsimplicity,eachprocess
piinputsabinaryvaluein.itothevoter.Intheabsenceoffaults,allthree
valuesareidentical.
Themojorityvoterprogramiswrittenasfollows:

ITMR1::out=?→out:=in.1

Thefaulttransitionsarerepresentedasfollows:
(in.1=in.2)∨(in.1=in.3)→in.1=⊥

TocomputetheseteitofearliestSS-inconsistenttransitions,westart
withatransitioninss(recallthatssencodesSSbyholdingthesetof
badtransitions),and“backtrack”alongeverycomputationuntilafault
transitionisreached.Toillustratethis,considerasmallexampleofa
computationαofthemajorityvoterinpresenceoffaults(astateofthe
majorityvoterisrepresentedasin.1,in.2,in.3,out:

α=β∙1,1,1,−∙⊥,1,1,−∙⊥,1,1,⊥∙γ
Recallthat⊥issome“bad”value.Variableout=−meansthatoutisnot
set.Transition(1,1,1,−∙⊥,1,1,−)isafaulttransition,andtransition

77

(⊥,1,1,−∙⊥,1,1,⊥)isabadtransitioninss,andneedstoberemoved.
Now,whencomputingseteit,thetransition(⊥,1,1,−∙⊥,1,1,⊥)isin
eit,sinceitisprecededbyafaulttransition.However,observethatevery
badtransitioninsswillalwaysbeprecededbyafaulttransition,andhence
ss⊆eit.Also,sincethereisonlyoneactioninthemajorityvoter,eit=ss.
Then,weneedtoremoveeitfromtheintolerantprogramtomakeitfail-safe.
Thus,thesetoftransitionsthatisremovedis:
{(s,t):t∈T},wherethesetTisdefinedas
T={t:((t(out)=t(in.1))∧((t(in.1)=t(in.2))∧(t(in.1)=t(in.3)))},as
4pterhaCinRunningalgorithmadd-efficient-fail-saferesultsinremovingseteitfrom
programTMR.RemovingseteitfromTMRwillbeequivalenttoremoving
setssfromTMR.So,thefail-safefault-tolerantversionofthemajorityvoter
program,withperfectdetection,andminimaldetectionlatencyisidentical
totheversionwithonlyperfectdetection(seebelow).

FSTMR1::(out=in.1)∧((in.1=in.2)∨(in.1=in.3))→out:=in.1

Theorem10(Fail-safeTMR)ProgramFSTMR1isfail-safefault-
tolerantwithperfectdetection,andminimaldetectionlatencytofaultsthat
corrupttheinputofatmostoneprocess.
ObservethatiffaultscanarbitrarilycorrupttheoutputoutoftheTMR
system,thennofail-safefault-tolerantTMRexists,henceourfocusontol-
dels.mofaulterableInthenextsection,wepresentsomediscussionabouttheapplicabilityof
.add-efficient-fail-saferithmalgothe

onscussiDi65.Fromtheexamplesprovided,wecanseethatthefail-safefault-tolerantma-
jorityvoterwithperfectdetection,andminimallatencyisidenticaltothefail-
safefault-tolerantmajorityvoterwithperfectdetectionpresentedinChap-
ter4.Thiscanbeexplainedbythefollowing:Algorithmadd-perfect-fail-safe

78

refinestheguardsofcriticalactions,whilealgorithmadd-efficient-fail-safe
refinestheguardsofcritical,aswellasnon-criticalactions.Sincethemajor-
ityvoterprogramconsistsofonlyacriticalaction,thefail-safefault-tolerant
majorityvoterresultingfrombothalgorithmscanbeexpectedtobeidenti-
cal.However,comparingthefail-safefault-tolerantprogramofthefirstex-
ampleofChapter4,andthatofthefirstexampleofthischapter,wefind
thatthefail-safefault-tolerantprogramsaredifferent.Thisisbecausethe
programhasbothcriticalandnon-criticalactions.Sincethealgorithmsre-
finetheguardsofdifferentsetsofactions,theresultingfail-safefault-tolerant
programscanbeexpectedtobedifferent.
Thus,giventhatthecomplexityofthealgorithmadd-efficient-fail-safeis
morethanthatofalgorithmadd-perfect-fail-safe,butthatbothalgorithms
yieldthesamefail-safefaulttoleranceversionofthemajorityvoterwith
perfectdetection,andminimaldetectionlatency,thenitisbettertouse
algorithmadd-perfect-fail-safetogeneratethefail-safefault-tolerantmajority
ersion.vterovIngeneral,fordistributedalgorithms,suchasmutualexclusion,token
ring,agreementproblemsetc,mostoftheactionsarecritical.Thus,asin
thecaseofthemajorityvoter,applyingalgorithmadd-perfect-fail-safeor
algorithmadd-efficient-fail-safetoafault-intolerantdistributedalgorithm
willresultinidenticalfail-safefault-tolerantdistributedalgorithm.Intu-
itively,anycomputationofadistributedalgorithminthepresenceoffaults
consistsofcriticaltransitionsandfaulttransitions.So,everyprogramtran-
sitionwhichisanearliestSS-inconsistenttransitionisalsoabadtransition,
andvice-versa.Thus,removingseteitwillalwayshaveresultidenticalto
whenremovingsetssfromthefault-intolerantprogramofadistributedal-
gorithm.Hence,fordistributedalgorithms,itispreferabletousealgorithm
add-perfect-fail-safe,sincetheywillalwayshaveperfectdetection,andmin-
imaldetectionlatency,andthealgorithmhavelowercomplexity.
However,algorithmadd-efficient-fail-safecanbeusedtoyieldfail-safe
fault-tolerantprogramswithperfectdetection,andminimaldetectionla-
tencyforawideclassofprogramswhichconsistsofbothcriticalandnon-

79

criticalactions.Embeddedprogramstypicallyconsistofbothcriticaland
non-criticalactions,asexamplifiedinthefirstexampleinthischapter.Then,
theseteitneedsnotbeequaltothesetssofbadtransitions,asinthecase
orithms.algdistributedof

5.7ChapterSummary

Inthischapter,wehavepresentedanoveltheoryoffastdetectorcompo-
nentsforthedesignofefficientfail-safefault-tolerantprograms.Thistheory
buildsuponthetheoryofperfectdetectors,andallowsthederivationofa
transformationalgorithmwhichautomaticallyaddsfaulttoleranceabilities
withperfectdetectionandminimaldetectionlatencytoaninitiallyfault-
intolerantprogram.Specifically,wehavemadetwoimportantcontributions
inthischapter:(i)Wehavepresentedatheoryoffastdetectors,thatensures
perfectdetection,andminimaldetectionlatencyoffail-safefault-tolerant
programand(ii)wehavedevelopedanalgorithmthataddsfast,andper-
fectdetectorstoafault-intolerantprogramtosynthesizeanefficientfail-safe
fault-tolerantprogram.Wehaveshownthatthecomplexityofthealgorithm
ispolynomialinthestatespaceofthefault-intolerantprogram.
Wehavealsoshownthatalgorithmadd-efficient-fail-safeisparticularly
suitableforaclassofprogramsthatconsistofbothcriticalandnon-critical
actions,whilealgorithmadd-perfect-fail-safeisparticularlysuitablefordis-
tributedalgorithms.
Themotivationofminimaldetectionlatencyisforfaultcontainment.
Theearlieranerrorisdetected,thehigheristheerrorcontainment.Ifan
errorisnotcontained,moresophisticatederrorrecoverymechanismsmay
berequiredtocorrectthefaultthaniftheerroriscontained.Specifically,if
anerroriscontained,alocalrecoveryproceduremaybeinitiated,butifthe
errorisnotcontainedandthestateofseveralprocessesiscorrupted,local
recoverymechanismsmaynotbeadequate.

80

6pterCha

DesignofEfficient
anceerolitMult

InChapters4and5,weintroducedtheconceptofperfect,andfastdetectors
respectively.Weprovidedalgorithmsthat,givenafault-intolerantprogram
p,afaultmodelF,andasafetyspecificationSSforp,synthesize(i)fail-safe
fault-tolerantprogramswithperfectdetection,and(ii)fail-safefault-tolerant
programswithperfectdetection,andminimaldetectionlatency.Inthose
cases,wehaveconsideredonlyonegivenfaultclass,i.e.,wehavesynthesized
efficientfail-safefaulttolerancetoagivenfaultclass.
However,inadistributedsetting,thenature,andtypeoffaultsoccurring
isvaried.Forexample,faultscanleadtomessageloss,corruptionofprogram
state,processorcrashandsoon.Thus,faultsthatcanaffectagivenprogram
cancomefromdifferentsources(calledfaultclasses),meaningthatthepro-
gramshouldbe(fail-safe)fault-toleranttothesedifferentfaultclasses,i.e.,
theprogramshouldbe(fail-safe)multitolerant.Thispointstoamethod-
ologythatcansupportsystematicadditionofsuchefficientmultitolerance,
i.e.,weaimtogeneralizetheresultsofChapters4(additionofperfectde-
tection)and5(additionofperfectandfastdetection)todealwithmultiple
.classesfaultInthischapter,weconsidertwodifferentapproachesforautomatedsyn-
thesisofefficientfail-safemultitolerantprograms.Thefirstdesignapproach
foradditionofmultitolerancethatweconsiderhandlesonefaultclassata
time,whereefficientfail-safefaulttolerancetodifferentfaultclassesisadded

81

inastepwise(compositional)fashion,i.e.,efficientfaulttoleranceisadded
toonegivenfaultclass,beforeanotherfaultclassisconsidered.Then,we
consideraseconddesignapproachthat,ontheotherhand,considersallfault
classesatthesametime.
Foreachdesignapproach,weprovidetwoalgorithmsfortheadditionof
efficientmultitolerance,andeachalgorithm(foreachapproach)addssome
efficiencypropertiestotheresultingmultitolerantprogramwithrespectto
eachfaultclassconsidered.Specifically,startingfromafault-intolerantpro-
gram,andthedifferentfaultclassestobetolerated,wepresentalgorithms
(foreachdesignapproach)that(i)addperfectfail-safemultitoleranceto
everyfaultclassconsidered,and(ii)addfail-safemultitolerancewithboth
perfectdetectionandminimaldetectionlatencytoeveryfaultclassconsid-
ered.Weshowthatthecorrespondingalgorithmsfromeachdesignapproach
yieldidenticalfail-safemultitolerantprograms.Weexploitthisrelationto
provepropertiesofprogramsgeneratedusingthesecondapproach
Thepropertiesofthefail-safemultitolerantprogramsresultingfromei-
therapproachare:either(i)theyhaveminimaldetectionlatency,andperfect
detectiontoeachfaultclass,or(ii)toeachfaultclass,thefail-safefault-
tolerantprogramshaveperfectdetection.Bywayofcontrast,Aroraand
Kulkarniobservedin[AK98a,Kul99]thatusingamethodthatconsiders
onefaultclassatatimemaynotyieldprogramsthatareoptimal(insome
sense)withrespecttoallfaultclasses,whereasthemethodthatconsiders
allfaultmodelsatthesametimemay.Here,weshowthatbothapproaches
yieldprogramsthatareefficient(withrespecttofaultdetection,andde-
tectionlatency)toallfaultclasses.Ineffect,wehaveidentifiedaclassof
multitolerantprograms(i.e.,fail-safemultitolerantprograms)andclassesof
efficiencyproperties(i.e.,perfectdetection,andminimaldetectionlatency)
forwhichtheseefficiencypropertiescanbeeffectivelydesignedforeachfault
classconsideredduringthedesignofsuchmultitolerantprograms.
Thefirstdesignapproachcanbeusedwhenfail-safefaulttoleranceto
newfaultclassesneedstobeaddedtoagivenprogram,whilstthesecond
approachcanbeusedwheneversomegivenfaultclassesarere-defined.

82

6.1Introduction

Inthischapter,weconsiderthedesignof(efficient)multitolerance,i.e.,the
abilityofaprogramtotoleratemultipleclassesoffaults.Specifically,we
restrictourattentiontoaddingfail-safefaulttolerancetomultiplefault
classestoagivenfault-intolerantprogram.Werecallthatafault-intolerant
programisonethatsatisfiesitsspecificationintheabsenceoffaults,but
violatesitinthepresenceoffaults.Specifically,asmentionedbefore,a
specificationiscomposedoftwoparts,namely(i)asafetyspecification,and
(ii)alivenessspecification,asindicatedbyAlpernandSchneider[AS85],and
afail-safefault-tolerantprogramsatisfiesatleastitssafetyspecificationin
faults.ofepresencInadistributedsetting,thenatureoffaultsarisingisvaried.Forexample,
faultsmaycorruptinputvariables,corrupttheinterfacesofprocesses,cause
lossofmessages,orprocessorcrashes,amongothers.Thus,whendesigning
a(fail-safe)fault-tolerantprogram,thedesignshouldbecognizantofthose
variedfaultclasses.Tomakeaprogramfail-safefault-toleranttoagiven
faultclass,(asetof)detectorsareaddedthathandlefaultsfromthatgiven
faultclass.Thus,totransformafault-intolerantprogramintoafail-safe
multitolerantone,asetofdetectorsisaddedtothefault-intolerantprogram,
whereeachdetectorhandlesfaultsfromaparticularfaultclass.
Oneobviousdifficultythatneedstobehandled,asobservedbyArora
andKulkarniin[AK98a],isthefactthatdetectorshandlingdifferentfault
classesmayinterfereeitherwitheachotherorwiththeprogram.Intuitively,
thismeansthateverycomputationofagivenprogramcomponent,inthe
presenceofotherprogramcomponents,isstillinitsspecification.Forex-
ample,oneconditionthatneedstobeverifiedisthatthenewcomponents
introduced(suchasdetectors)shouldnotinterferewiththebehaviorofthe
originalprogram.Thus,non-interferencebetweendifferentprogramcompo-
nentsshouldbeverified.
However,givenourfocusonefficiencypropertiesofthefail-safefault-
tolerantprograms,suchasperfectdetection,andminimaldetectionlatency,
theaboveverificationconditionsmaynotsuffice.Toseethis,considerthe

83

following:Thoughadditionoftwodetectorsmaynotcauseinterference,one
maycausetheresultingfail-safefault-tolerantprogramtoloseoneofits
efficiencyproperties,eitherperfectdetectionorminimaldetectionlatency
orboth.Thus,theverificationconditionsneedtobeextendedtodealwith
thoseefficiencyproperties.
Therefore,thedifficultytobehandledwhenaddingdetectorstoapro-
gramp(resultinginprogramp)foranewfaultclassFn,inadditionto
ascertainingnon-interferenceacrossdifferentprogramcomponents,istoen-
surethatthenewdetectorsincludedforthenewfaultclassFn(i)pstill
hasefficiencypropertieswithrespecttootherfaultclasses,and(ii)phas
efficiencypropertieswithrespecttoFn,i.e.,theresultingfail-safemultitol-
erantprogramphasefficiencypropertiestoallfaultclassesconsidered.In
otherwords,whennewdetectorcomponentsareaddedtoagivenprogram
foranewfaultclass,theverificationconditionsare(i)thereisnointerference
amongthedifferentprogramcomponents,(ii)thenewcomponentspreserve
andextendtheefficiencypropertiesoftheoriginalprogramwithrespectto
otherfaultclasses.Thus,non-interferencepropertiesshouldincludeboth
behavioralandefficiencyaspects.
Therearetwopossibleapproachesforthedesignofmultitolerance:
1.Thefirstapproachconsidersonefaultmodelatatime,and
2.Thesecondapproachconsidersallfaultmodelsatthesametime.
Forthefirstapproach,wepresenttwoalgorithms,onethatautomatically
yieldsfail-safemultitolerantprograms,withperfectdetectiontoallfault
classesconsidered,andanotherthatyieldsfail-safemultitolerantprograms
withperfectdetection,andminimaldetectionlatencytoallfaultclasses.
Bywayofcontrast,AroraandKulkarniarguedin[AK98a]thatprograms
designedusingthisappraochcanhavecomplexity(insomesense)whichis
efficientforsome,butnotall,faultclasses.
Forthesecondapproach,wepresenttwoalgorithmsthatconsiderallthe
faultclassesatthesametime.Thefirstalgorithmyieldsafail-safemultitol-
erantprogramwithperfectdetectiontoallfaultclassesconsidered,whilethe

84

secondalgorithmyieldsfail-safemultitolerantprogramswithperfectdetec-
tion,andminimaldetectionlatencytoallfaultclassesconsidered.Wealso
showthattheresultingfail-safemultitolerantprogramsobtainedfromthe
correspondingalgorithms,e.g.,thosethataddfail-safefaulttolerancewith
perfectdetection,fromeachdesignapproachareidentical.Forexample,the
fail-safemultitolerantprogramobtainedfromthealgorithmthataddsmul-
titolerancewithperfectdetectiontoaprogramtoeveryfaultclassaccording
tothefirstapproachisidenticaltothefail-safemultitolerantprogramob-
tainedfromthealgorithmthataddsmultitolerancewithperfectdetectionto
everyfaultclassaccordingtothesecondapproach.
Thus,thecontributionsinthischapterare:

1.Wepresentnon-interferenceconditions(behavioralandefficiency)to
beverifiedduringthedesignofmultitolerance.

2.Wepresentanautomatedapproachfordesignofefficientfail-safemul-
titolerantprogramsbyconsideringonefaultclassatatime.

3.Wepresentanautomatedapproachfordesigningefficientfail-safemul-
titolerantprogramsbyconsideringallfaultclassesatthesametime.

4.Wealsoshowthattheprogramsobtainedbycorrespondingalgorithms
ofeitherapproachesareidentical,andthattheyhavesomeefficiency
propertiesforallfaultclasses.

Thischapterisstructuredasfollows:InSection6.2,wepresentthenon-
interferenceconditionsthathavetobeverifiedduringtheadditionofmulti-
tolerance.InSection6.3,wepresentthestepwiseapproach(onefaultclassat
atime)forautomaticadditionofmultitolerance,andprovidetwoalgorithms
thatyieldsfail-safemultitolerantprograms.Wepresenttwootheralgorithms
thathandleallthefaultclassesatthesametimeinSection6.4.Wediscuss
andsummarizetheresultspresentedinthischapterinSection6.5.

85

6.2IssuesinMultitoleranceDesign
Inthissection,wepresentanddiscussthenon-interferenceissuesinvolvedin
thedesignofefficientmultitolerance,i.e.,fail-safefaulttolerancetomultiple
faultclasseswithperfectdetection,andminimaldetectionlatencytoallfault
.classesFirst,wedefineafail-safemultitolerantprogram:

Definition50(Fail-SafeMultitolerantProgram)Givenaprogramp
withspecificationS,andsafetyspecificationSS,andnfaultclassesF1...Fn.
Aprogrampissaidtobefail-safemultitoleranttofaultclassesF1...Fniff
pisfail-safeFi-tolerantforeach1≤i≤n.

Asmentionedintheintroductionofthischapter,therearetwopos-
sibleapproachesforthedesignofmultitolerantprogram.Theissuesand
discussionspresentedinthissectionmostlyapplytoastepwiseapproach
thatconsidersonefaultclassatatime,insomefixedorderF1...Fnin
whichafault-intolerantprogrampistransformedintoafail-safemultitol-
erantprogramtofaultclassesF1...Fn.Ingeneral,inthefirststep,the
fault-intolerantprogrampisaugmentedwithdetectorsthatwillmakeitfail-
safeF1-tolerant,i.e.,theresultingprogramp1isfail-safeF1-tolerant.Then,
inthesecondstep,theresultingprogramp1isaugmentedwithdetectorsthat
willmakeitfail-safeF2-tolerant,whilepreservingitsfail-safeF1-tolerance.
Thesameisrepeated,untilthenthstep,wheretheprogramisaugmented
withdetectorsthatwillprovidefail-safeFn-tolerance,whilepreservingfail-
safefault-tolerancetoF1...Fn−1.However,becauseofourfocusonperfect
detection,andminimaldetectionlatency,thesestepsneedtobeextendedto
dealwiththoseefficiencyrequirements.
Thestepsareextendedasfollows,below:
Inthefirststep,whenthefault-intolerantprogrampisaugmentedwith
detectorsthatwillmakeitafail-safeF1-tolerantprogramp1,p1shouldhave
perfectdetectionand/orminimaldetectionlatencytoF1.Thefollowing
non-interferenceconditionsneedtobeverified:

86

1.IntheabsenceofF1,thedetectorcomponentsaddedtopdonotinter-
ferewithp,i.e.,eachcomputationofpisintheproblemspecification
evenifitexecutesconcurrentlywiththenewdetectorcomponents.

2.InthepresenceoffaultsF1,eachcomputationofthedetectorcompo-
nentsisinthecomponents’specificationeveniftheyexecuteconcur-
.pwithtlyren

3.InthepresenceoffaultsF1,thedetectorcomponentsaddedtopprovide
perfectdetectionand/orminimaldetectionlatencytoF1.

Note:Inthissection,wheneveritisclearfromthecontext,wewilluse
theterm“fail-safefault-tolerantprogram(fail-safefaulttolerance)”tomean
“fail-safefault-tolerantprogram(fail-safefaulttolerance)withperfectdetec-
tion,and/orminimaldetectionlatency”.
Inthesecondstep,whenthefail-safeF1-tolerantprogramp1isaugmented
withdetectorsthatwillmakep1fail-safefault-toleranttoF2(i.e.,transform
itintoaprogramp2),thefollowingnon-interferenceconditionsneedtobe
satisfied:

1.IntheabsenceofF1andF2,thenewdetectorsforfail-safefaulttoler-
ancetoF2donotinterferewithp1,i.e.,eachcomputationofp1satisfies
theproblemspecificationevenifp1executesconcurrentlywiththenew
.sdetector

2.InthepresenceofF1,thenewdetectorsforfail-safefaulttoleranceto
F2donotinterferewiththefail-safefaulttolerancetoF1ofp1,i.e.,
everycomputationofp1isinthefail-safefaulttolerancespecification
toF1evenifp1executesconcurrentlywiththenewcomponents.

3.InthepresenceofF1,thenewdetectorsforfail-safefaulttoleranceto
F2donotinterferewiththeperfectdetectionand/orminimaldetection
latencytoF1ofp1.

4.InthepresenceofF2,p1doesnotinterferewiththenewdetectorsthat
providefail-safefault-tolerancetoF2.

87

5.InthepresenceofF2,p1doesnotinterferewiththenewdetectorcom-
ponentsprovidingperfectdetection,and/orminimaldetectionlatency
Fto2Intheithstep,whenthefail-safeFi−1-tolerantprogrampi−1isaugmented
withdetectorsthatwilltransformitintoafail-safeFi-tolerantprogrampi
(i.e,piisfail-safefault-toleranttofaultclassesF1...Fi),thefollowingnon-
interferenceconditionsneedtobesatisfied:
1.IntheabsenceoffaultsF1...Fi,thenewdetectorsforfail-safefault
tolerancetoFidonotinterferewithpi−1,i.e.,eachcomputationofpi−1
satisfiestheproblemspecificationevenifpi−1executesconcurrently
withthenewdetectorcomponentsforfail-safefaulttolerancetoFi.
2.InthepresenceofF1,thenewdetectorsforfail-safefaulttoleranceto
Fidonotinterferewiththefail-safefault-tolerancetoF1ofpi−1,i.e.,
everycomputationofpi−1isinthefail-safefaulttolerancespecification
toF1evenifpi−1executesconcurrentlywiththenewcomponents.
3.InthepresenceofF1,thenewdetectorsforfail-safefaulttolerancetoF2
donotinterferewiththeperfectdetection,and/orminimaldetection
latencytoF1ofpi−1.
...4.5.InthepresenceoffaultsFi,pi−1doesnotinterferewiththenewde-
tectorcomponentsthatprovidefail-safefaulttolerancetoFi,i.e.,each
computationofthedetectorcomponentsforfail-safefaulttoleranceto
Fiisinthecomponents’specification.
6.InthepresenceofFi,pi−1doesnotinterferewiththenewdetector
componentsthatprovideperfectdetection,and/orminimaldetection
latencytoFi.
Automatedproceduresthataddfail-safemultitolerancetoapreviously
fault-intolerantprogramneedtoguaranteethattheseconditionsaremetby
design.

88

Inthenextsection,weconsideradesignapproachforadditionofmultitol-
erancethathandlesonegivenfaultclassatatime,andwethenpresenttwo
algorithmsthatautomaticallyyieldfail-safemultitolerantprogramswithdif-
feringefficiencies.Thefirstalgorithm,presentedinthefirstpart,yieldsfail-
safemultitolerantprogramswithperfectdetectiontofaultclassesF1...Fn,
whilethesecondalgorithm,presentedinthesecondpart,yieldsfail-safemul-
titolerantprogramswithperfectdetection,andminimaldetectionlatencyto
faultclassesF1...Fn,byconsideringonefaultclassatatime.
Beforepresentingthealgorithmsthatautomaticallyaddmultitolerance,
weadoptastep-by-stepderivationofthealgorithm,andeachstepofthe
algorithmisshowntoguaranteethatthenon-interferenceconditionsstated
aremetbydesign.

6.3One-at-a-timeDesignofMultitolerance
Inderivingbothalgorithms,wefocusonthecaseoftwofaultclasses,and
theapproachcanbeeasilygeneralizedtonfaultclasses.

6.3.1MultitolerantProgramsWithPerfectDetection
Givenafault-intolerantprogrampwithsafetyspecificationSS,andnfault
classesF1...Fnwhichhavetobetolerated,theideaistotransformpinto
aprogrampnthatisfail-safefault-toleranttoF1...Fnwithperfectdetec-
tionforeachfaultclass.Todothis,wefirstconsiderfaultclassF1,then
F2untilfaultclassFnishandled.Inthissection,wheneveritisclearfrom
thecontext,wewillusetheterm“fail-safefault-tolerantprogram(fail-safe
faulttolerance)”tomean“fail-safefault-tolerantprogram(fail-safefaulttol-
erance)withperfectdetection”.
Beforeexplainingandintroducingourautomatedapproachforaddition
offail-safemultitolerance,wepresentaresultuponwhichourapproachis
based.Intuitively,theresultstatesthat,startingwithaprogrampithatis
fail-safefault-toleranttofaultclassesF1...Fiwithperfectdetectiontoeach
ofthesefaultclasses,composingpiwithaperfectdetectorforfaultclassFi+1
suchthattheresultingprogrampi+1isfail-safefault-toleranttoFi+1with

89

perfectdetection,thenpi+1alsopreservestheefficiencypropertiesofpiwith
respecttoF1...Fi.Saidotherwise,composingaprogrampiasabovewith
perfectdetectorsforanewfaultclassFi+1satisfytheverificationconditions
presentedinSection6.2.
Lemma8(Perfectdetectorsandmultitolerance)Givenafault-
intolerantprogrampwithsafetyspecificationSS.Givenaprogrampi−1
whichisfail-safemultitolerantforSSwithperfectdetectiontofaultclasses
F1...Fi−1.Givenalsoaprogrampiobtainedfrompi−1bycomposingcritical
actionsofpi−1withperfectdetectors,suchthatpiisfail-safefault-tolerantto
faultclassFiwithperfectdetection.Then,piisalsofail-safemultitolerant
withperfectdetectiontofaultclassesF1...Fi−1.
Proof.Assume:(i)Afault-intolerantprogramp0=pwithsafetyspec-
ificationSS,(ii)Programpi−1isfail-safemultitolerantforSSwithperfect
detectiontofaultclassesF1...Fi−1,(iii)anewfaultclassFiwhichneeds
tobetolerated,(iv)programpithatisfail-safefault-tolerantforSSwith
perfectdetectiontoFiobtainedbycomposingsomecriticalactionsofpi−1
withperfectdetectorsforFi.
Prove:Programpiisfail-safemultitolerantforSSwithperfectdetection
tofaultclassesF1...Fi−1.
1.Fromassumption,pi−1=p0\B1i−1,whereB1i−1⊆ssandssistheset
nsitions.trabadof2.Fromassumption,pi=pi−1[]cPDi,where[]cmeanscomposingcritical
actions,andPDimeaningperfectdetectorsthatwilltoleratefaultclass
.Fi3.From3,pi=pi−1\Bi,whereBi={(s,t):(s,t)isinducedbyacritical
action∧(s,t)isreachableinpresenceofFi∧(s,t)∈ss}.
4.From3,sincenoSS-inconsistenttransitionisaddedtopi−1,pihas
completedetectiontofaultclassesF1...Fi−1.
5.From3,sincenoSS-consistenttransitionisremovedfrompi−1,pihas
accuratedetectiontofaultclassesF1...Fi−1.

90

6.From4,and5,pihasperfectdetectiontofaultclassesF1...Fi−1.

Theabovelemmashowsthatbycomposingcriticalactionswithperfect
detectorsthatmakesagivenprogramfail-safefault-toleranttoanewfault
class,fail-safefaulttolerancewithperfectdetectiontopreviousfaultclassesis
preserved,i.e.,thereisnointerferencebetweenthenewdetectorcomponents
withdetectorcomponentsforpreviousfaultclasseseitheratthebehavioral
levelorattheefficiencylevel.Thisresultallowsustoreusealgorithmadd-
perfect-fail-safe(seeChapter4),sinceadd-perfect-fail-safegeneratesperfect
detectorsforagivenfaultclass.

Step1inMultitoleranceDesign

Tosynthesizeaprogramthatisfail-safefault-toleranttoF1,starting
fromafault-intolerantprogramp,withperfectdetectiontoF1,wefirstneed
tocomputethesetss1ofbadtransitionsforpreachableinthepresenceof
faultsF1(reachablebyusingtransitionsinδpF1)(fromLemmas2,3).Wethen
removethosetransitionsfromprogramp,toobtainaprogramp1,whichis
fail-safeF1-tolerant,withperfectdetectionforF1,asshowninFig6.1,where
ssisthesetofbadtransitionsthatthesafetyspecificationSSofprejects.
p1:=add-perfect-fail-safe(p,F1,ss)

Figure6.1:Thefirststepinthedesignofmultitolerantprogramswithperfect
detection.

Atthispoint,weneedtoverifythenon-interferenceconditionsforthe
firststepofthetransformation.
First,byconstruction,p1hasperfectdetectiontoF1.
Second,weneedtoprovethat“IntheabsenceofF1,thedetectorcompo-
nentsaddedtopdonotinterferewithp,i.e.,eachcomputationofpisinthe
problemspecificationevenifitexecutesconcurrentlywiththenewdetector
”.onentsompc

91

Proof.Byconstruction(usingalgorithmadd-efficient-fail-safe,thede-
tectorcomponentsforfail-safefaulttolerancetoF1donotinterferewithp
Third,weneedtoprovethat“InthepresenceoffaultsF1,eachcom-
putationofthedetectorcomponentsisinthecomponents’specificationeven
iftheyexecuteconcurrentlywithp,i.e.,pdoesnotinterferewiththenew
detectorcomponents.”.
Proof.Byconstruction,pdoesnotinterferewiththedetectorcomponents
forfail-safefaulttolerancetoF1.

Step2inMultitoleranceDesign

Inthesecondstepofthemultitoleranceadditionprocedure,weconsider
faultclassF2,andwetransformprogramp1(whichisfail-safefault-tolerant
toF1)intoaprogramp2thatisfail-safefault-toleranttoF2,whilepreserving
theexistingfail-safefaulttolerancetoF1.Specifically,wecomputetheset
ss2ofbadtransitionsthatarereachableinpresenceoffaultsF2,andwe
removethosetransitionsfromprogramp1toobtainprogramp2,whichis
fail-safefault-tolerant,withperfectdetectiontobothF1,andF2,asshown
.6.2Figinp2:=add-perfect-fail-safe(p1,F2,ss)

Figure6.2:Thesecondstepinthedesignofmultitolerantprogramswith
detection.cterfep

Byconstruction,p2hasperfectdetectiontoF2,i.e.,p1doesnotinterfere
withthenewdetectorcomponentsthatprovideperfectdetectiontofault
.Fclass2Toverifytheothernon-interferenceproperties,wefirstneedtoprove
that“IntheabsenceofF1andF2,thedetectorcomponentsaddedtop1for
fail-safefaulttolerancetoF2donotinterferewithp1,i.e.,eachcomputation
ofp1isintheproblemspecificationevenifitexecutesconcurrentlywiththe
newdetectorcomponents”.

92

Proof.Byconstruction,thenewdetectorcomponentsforfail-safefault
tolerancetoF2donotinterferewithp1.
Wenowprovethesecondpartofthenon-interferenceconditions,whichis
“InthepresenceofF1,thenewdetectorsforfail-safefaulttolerancetoF2do
notinterferewiththefail-safeF1-toleranceofp1,i.e.,everycomputationof
p1isinthefail-safeF1-tolerancespecificationevenifp1executesconcurrently
withthenewcomponents.”
Proof.Weprovethisbycontradiction.Wefirstassumethatthereexists
acomputationinpresenceofF1thatviolatessafety,andshowthatsucha
computationcannotexist,i.e.,acontradiction.
1.Givenp2=add-efficient-fail-safe(p1,F2,ss)
2.AssumethatthereisacomputationαinpresenceofF1thatviolates
ysafet3.Fromstep3andProposition2,αcontainsabadtransitionτthatis
reachableinpresenceofF1.
4.Byconstructionofp1,τ∈δp1
5.Byconstructionofp2,transitionτisnotaddedtoδp1.
6.Fromsteps3,4,and5,wehaveacontradiction.

Wenowprovethat“InthepresenceofF1,thedetectorcomponentsfor
fail-safefaulttolerancetoF2donotinterferewiththeperfectdetectionofp1
”.Fto1f.oPro1.Fromthefactthatthenewdetectorcomponentsdonotinterferewith
thefail-safeF1-toleranceofp1inthepresenceofF1,wededucethat
thedetectorcomponentsforF1arecomplete.
2.SinceonlySS-inconsistenttransitionsareremoved,thedetectorcom-
ponentsforF1areaccurate.

93

3.Fromsteps1,and3,thedetectorcomponentsforF1areperfect.

Theproofofthelastpart,whichis“InthepresenceofF2,p1doesnot
interferewiththenewdetectorsthatprovidefail-safefaulttolerancetoF2.”
Proof.Byconstruction,p1doesnotinterferewiththenewdetectorcom-
ponentsforfail-safefaulttolerancetoF2.

StepKinMultitoleranceDesign

Inthekthstep(3≤k≤n),weconsiderfaultclassFk,andwetransforma
fail-safeF1...Fk−1-tolerantprogrampk−1,i.e.,pk−1isfail-safefault-tolerant
tofaultclassesF1...Fk−1,intoprogrampkthatisfail-safefault-tolerant
toFk,whilealsopreservingthefail-safefaulttolerancetoF1...Fk−1i.e.,
programpkisfail-safeF1...Fk-tolerant.Todothis,wecomputethesetssk
ofbadtransitionsthatarereachableinpresenceoffaultsFk,andweremove
thosetransitionsfromprogrampk−1toobtainprogrampkwhichhavefail-
safefaulttolerancewithperfectdetectiontofaultclassesF1...Fk.Thekth
stepisshowninFig6.3.
pk:=add-perfect-fail-safe(pk−1,Fk,ss)

Figure6.3:Thekthstepinthedesignofmultitolerantprogramswithperfect
detection.Withthisstep,weneedtoverifythatnon-interferenceconditionsare
teed.ranguaFirst,pkhasperfectdetectiontoFkbyconstruction,i.e.,programpk−1
doesnotinterferewiththeperfectdetectionofthenewdetectorcomponents
forfaultclassF2.
Toverifytheothernon-interferenceproperties,weprovethat“Inthe
absenceofF1...Fk,thedetectorcomponentsaddedtopk−1forfail-safefault
tolerancetoFkdonotinterferewithpk−1,i.e.,eachcomputationofpk−1
isintheproblemspecificationevenifitexecutesconcurrentlywiththenew
detectorcomponents”.

94

Proof.Byconstruction,thenewdetectorcomponentsforfail-safefault
tolerancetoFkdonotinterferewithpk−1.
Wenowprovetheithpart(2≤i<k)ofthenon-interferenceconditions,
whichis“InthepresenceofFi,thenewdetectorsforfail-safefaulttolerance
toFkdonotinterferewiththefail-safefaulttolerancetoFiofpk−1,i.e.,
everycomputationofpk−1isinthefail-safefaulttolerancespecificationtoFi
evenifpk−1executesconcurrentlywiththenewdetectorcomponents.”
Proof.Weprovethisbycontradiction.Wefirstassumethatthereexists
acomputationinpresenceofFithatviolatessafety,andshowthatsucha
computationcannotexist,i.e.,acontradiction.
1.Givenpk=add-efficient-fail-safe(pk−1,Fk,ss)
2.AssumethatthereisacomputationαinpresenceofFithatviolates
ysafet3.Fromstep3andProposition.2,αcontainsabadtransitionτthatis
reachableinpresenceofFi.
4.Byconstructionofpk−1,τ∈δpk−1
5.Byconstructionofpk,τisnotaddedtoδpk−1
6.Fromsteps3,4and5,wehaveacontradiction.

Wenowprovethat“InthepresenceofFi,thenewdetectorcomponents
forfail-safefaulttolerancetoFkdonotinterferewiththeperfectdetection
ofpk−1toFi.”
f.oPro1.Fromthefactthatthenewdetectorcomponentsdonotinterferewith
thefail-safeFi-toleranceofpi−1inthepresenceofFi,wededucethat
thedetectorcomponentsforFiinpi−1arecomplete.
2.SinceonlySS-inconsistenttransitionsareremoved,thedetectorcom-
ponentsforFiinpi−1areaccurate.

95

3.Fromsteps1,and3,thedetectorcomponentsforFiareperfect.

Theproofofthelastpart,whichis“InthepresenceofFk,pk−1doesnot
interferewiththenewdetectorsthatprovidefail-safefaulttolerancetoFk.”
Proof.Byconstruction,pk−1doesnotinterferewiththenewdetector
componentsforfail-safefaulttolerancetoFk.
Observethat,ingeneral,because(i)thenewdetectorcomponentsthat
providefail-safefaulttolerancetoFkdonotinterferewiththefail-safefault
toleranceofpk−1toallfaultclassesFi(1≤i<k),and(ii)onlybadtran-
sitionsareremovedfrompk−1,theperfectdetectiontoallfaultclassesFiis
ed.preservIngeneral,thealgorithmforautomaticsynthesisoffail-safemultitolerant
programswithperfectdetectiontoallfaultclassesisshowninFig.6.4
add-perfect-fail-safe-multitolerance(p,[F1...Fn],ss:setoftransi-
:)nstio

{i:=1;p0:=p
while(i≤n)do{
pi:=add-perfect-fail-safe(pi−1,Fi,ss);
i:=i+1;}od
})preturn(n

Figure6.4:Thealgorithmaddsfail-safefaulttolerancetonfaultclasses,
withperfectdetectiontoeveryfaultclass

Theorem11(Multitolerancewithperfectdetection)Givenafault-
intolerantprogrampwithsafetyspecificationSS,andnfaultclasses
F1...Fn.Algorithmadd-perfect-fail-safe-multitolerance(p,[F1...Fn],ss)re-
turnsaprogramthatisfail-safefault-toleranttoF1...Fn,withperfectde-
tectiontoallthefaultclasses.

Inthissection,wehavepresentedastepwiseapproachfortheautomatic
designofmultitolerance.Wehaveprovedthateverystepofthealgorithm

96

guaranteesthatthereisnointerferencebetweenthenewdetectorcomponents
andthoseexistingfail-safefaulttolerancemechanismsforotherfaultclasses,
aswellasnointerferencewiththeirperfectdetectiontothosefaultclasses.
Inthenexttwosections,wewillpresentexamplestoshowtheworkingof
rithm.algothe

6.3.2ASimpleExample
One-at-a-TimeAdditionofPerfectFail-SafeFaultTolerance

Inthissection,wepresentasmallexampletoillustratehowalgorithm
add-perfect-fail-safe-multitoleranceworks.Fig6.5showsthefault-intolerant
programinpresenceoffaultsF1.Inthisexample,transitions(10,11)and
(20,21)arebadtransitions.

Figure6.5:Fault-intolerantprograminthepresenceofF1–firstiterationof
rithmalgothe

Duringthefirstiterationthroughthealgorithm,bothbadtransitionsare
removedsincebotharereachableinpresenceofF1(throughthecalltoadd-
perfect-fail-safe).Theresultingfail-safefault-tolerantprogramwithperfect
detectiontoF1isshowninFig.6.6.Denoteitbyp1.
Then,fortheseconditerationthroughadd-perfect-fail-safe-
multitolerance,weneedtoaddperfectfail-safefaulttolerancetoF2
top1,whilepreservingthefail-safefaulttolerancewithperfectdetectionof

97

Figure6.6:Resultingfail-safefault-tolerantprogramp1toF1

p1toF1.First,weconsiderp1inpresenceofF2,asshowninFig.6.7

Figure6.7:Resultingfail-safefault-tolerantprogramp1inpresenceofF2

InthepresenceofF2,nobadtransitionisreachable,sinceallofthem
hasbeenremovedduringthepreviouspass.So,theprogramisalsofail-safe
fault-toleranttoF2,whilemaintainingthefail-safefault-tolerancetoF1.The
resultingprogram(p2)isshowninFig.6.8.ObservethatinpresenceofF1
orF2,p2willneverviolatethesafetyspecification.

6.3.3TokenRing
ThetokenringwasdescribedinChapter4.Processes0...Narearranged
inaring.Processk,0≤k<Npassesthetokentoprocessk+1,whereas

98

Figure6.8:Resultingfail-safemultitolerantprogramp2toF1andF2with
perfectdetectiontobothfaultclasses.

processNpassesthetokentoprocess0.Eachprocesskhasabinaryvariable,
t.k,andaprocessk,k=Nholdsthetokenifft.k=t.(k+1),andprocess
Nholdsthetokenifft.N=t.0.
Thefault-intolerantprogramforthetokenringisasfollows(+2ismodulo-
:ion)addit2

ITR1::k=0∧t.k=t.(k−1)→t.k:=t.(k−1)
ITR2::k=0∧t.k=t.N+21→t.k:=t.N+21

Fail-SafeFaultTolerancetoFaultClassF1:First,weconsiderafault
classwherefaultactionscancorruptthestateofasingleprocessk,which
canbeanyprocess.

Faultaction:Thefaultclassthatweconsideris

F::|{k:t.k=⊥}|=0→t.k:=⊥

Runningalgorithmadd-perfect-fail-safe-multitolerancewillresultinthe
followingprogramafterthefirstiteration

99

1-FSTR1::|{k:t.k=⊥}|=1∧t.(k−1)=⊥∧k=0∧t.k=t.(k−1)→t.k:=t.(k−1)
1-FSTR2::|{k:t.k=⊥}|=1∧t.N=⊥∧k=0∧t.k=t.N+21→t.k:=t.N+21

Theorem12(Fail-safeTR)Program1-FSTRisfail-safefault-tolerantto
faultsthatcorruptthestateofasingleprocessk,whichcanbeanyprocess.

Fail-SafeFaultTolerancetoFaultClassF2:Second,weconsidera
faultclasswherefaultactionscancorruptthestateofanytwoprocessesk
.land

Faultaction:Thefaultclassthatweconsideris

F::|{k:t.k=⊥}|=1∧t.k=⊥→t.k:=⊥

Theseconditerationofalgorithmadd-perfect-fail-safe-multitolerancewill
resultinthefollowingprogram:

2-FSTR1::|{k:t.k=⊥}|≤2∧t.(k−1)=⊥∧k=0∧t.k=t.(k−1)→t.k:=t.(k−1)
2-FSTR2::|{k:t.k=⊥}|≤2∧t.N=⊥∧k=0∧t.k=t.N+21→t.k:=t.N+21

Theorem13(Fail-safeTR)Program2-FSTRisfail-safefault-tolerantto
faultsthatcorruptthestateofatmosttwoprocesseskandl,whichcanbe
ess.coprany

Fail-SafeFaultTolerancetoFaultClassFN+1:Finally,weconsidera
faultclasswherefaultactionscancorruptthestateofn(3≤n≤N+1)
s.cessepro

010

Faultaction:Thefaultactionthatweconsideris

F::|{k:t.k=⊥}|=n−1∧t.k=⊥→t.k:=⊥

Thenthiterationofalgorithmadd-perfect-fail-safe-multitolerancewillre-
sultinthefollowingprogram:

n-FSTR1::|{k:t.k=⊥}|≤n∧t.(k−1)=⊥∧k=0∧t.k=t.(k−1)→t.k:=t.(k−1)
n-FSTR2::|{k:t.k=⊥}|≤n∧t.N=⊥∧k=0∧t.k=t.N+21→t.k:=t.N+21
Theorem14(Fail-safeTR)Programn-FSTRisfail-safefault-tolerantto
faultsthatcancorruptthestateofanynumberofprocesses,uptoallnpro-
esses.cFromprogramn-FSTR,weknowthatwhenn=N+1,|{k:t.k=⊥}|≤n
isalways“True”,soprogramn-FSTRsimplifiesto:

MFSTR1::t.(k−1)=⊥∧k=0∧t.k=t.(k−1)→t.k:=t.(k−1)
MFSTR2::t.N=⊥∧k=0∧t.k=t.N+21→t.k:=t.N+21
ProgramMFSTRisidenticaltothefail-safefault-toleranttokenring
programpresentedbyAroraandKulkarniin[AK98a].However,ourinter-
mediateprogramsaredifferent.Thisisbecause,in[AK98a],certainbad
transitionsthatareunreachableinthepresenceofafaultclassFiwereal-
readyremoved.Thus,thoughtheoverallmultitolerantprogramiscorrect,
theintermediateprogramsadoptedamoredefensiveapproach,byremov-
ingmoretransitionsthatarenecessary.Aswayofcontrast,ourapproach
removesonlybadtransitionsthatarereachableinthepresenceoffaults.
Inthissection,wehaveconsideredtheautomateddesignofmultitolerance
withperfectdetectiontoallfaultclasses,byconsideringonefaultclassata
time.Inthenextsection,weconsidertheautomateddesignofmultitolerance
withperfectdetection,andminimaldetectionlatencytoallfaultclassesby
consideringonefaultclassatatime.

110

6.3.4MultitolerantProgramsWithPerfectDetection
andMinimalDetectionLatency
Intheprevioussection,wepresentedanalgorithm(togetherwithnecessary
proofsofnon-interference)thatyieldsfail-safemultitolerantprogramsto
nfaultclasses,withperfectdetectiontoeveryfaultclass.Thealgorithm
considersthefaultclassesinagiventotalorder.
Inthissection,analgorithmisdeveloped(alongwithrelevantproof
ofnon-interference)thatyieldsfail-safemultitolerantprogramstonfault
classes,withperfectdetection,andminimaldetectionlatencytoeveryfault
class.Again,thefaultclassesareconsideredinagiventotalorder.In-
tuitively,theapproachbuildspartlyuponalgorithmadd-efficient-fail-safe,
whereby,foreachfaultclass,thesetofearliestSS-inconsistenttransitionsis
computed,andthesetransitionsarethenremovedfromthegivenprogram.
Givenafault-intolerantprogrampwithsafetyspecificationSS,andn
faultclassesF1...Fn,theideaistotransformpintoaprogrampnthatisfail-
safefault-toleranttoF1...Fnwithperfectdetectionandminimaldetection
latencytoeachfaultclass,byfirstconsideringfaultclassF1,thenF2untilFn
isconsidered.Specifically,givenafaultclassFi,andaprogrampi−1thatis
fail-safefault-tolerantwithperfectdetection,andminimaldetectionlatency
tofaultclassesF1...Fi−1,pi−1istransformedintoaprogrampiwhichis
fail-safefault-tolerantwithperfectdetection,andminimaldetectionlatency
toFi,whilepreservingthefail-safefaulttolerancewithperfectdetectionand
minimaldetectionlatencyofpi−1toF1...Fi.
Apartfromhavingtoverifynon-interferencebetweenaprogrampi−1
andthenewdetectorcomponentsforF2,wealsoneedtoverifythatthe
perfectdetection,andminimaldetectionlatencyofprogrampi−1tofault
classesF1...Fi−1arenotinterferedwithwhenaddingfail-safefaulttoler-
ancewithperfectdetection,andminimallatencytoF2.Thus,thesetof
non-interferenceconditions,presentedinSection6.2,isextendedwiththose
conditionsthatguaranteethatnointerferenceexistsbetweenthenewde-
tectorcomponentsforFi,andtheminimaldetectionlatencyofpi−1tofault
classesF1...Fi−1.

210

Inthissection,weshowthestepwiseadditionoffail-safefaulttolerance
withperfectdetection,andminimaldetectionlatencytotwofaultclassesF1
andF2.Theprocedurecanbeeasilygeneralizedtonfaultclasses.
Note:Inthissection,wheneveritisclearfromthecontext,wewilluse
theterm“fail-safefault-tolerantprogram(fail-safefaulttolerance)”tomean
“fail-safefault-tolerantprogram(fail-safefaulttolerance)withperfectdetec-
tion,andminimaldetectionlatency”.

Step1inDesignofEfficientMultitolerance

Givenare:(i)Afault-intolerantprogrampwithsafetyspecificationSS,
and(ii)faultclassesF1...Fntobetolerated.Totransformpintoaprogram
p1thatisfail-safefault-toleranttoF1,seteit1ofearliestSS-inconsistenttran-
sitionsforpinpresenceoffaultsF1iscomputed,andthissetoftransitions
isthenremovedfromp.Theprogramp1obtainedisfail-safefault-tolerant,
withperfectdetection,andminimaldetectionlatencytoF1.Thisfirststep
isshowninFig6.9.

eit1:=get-eit(p,F1,ss)
p1:=p\eit1

Figure6.9:Thefirststepinthedesignofmultitolerantprogramswithperfect
detectionandminimallatency.

Proposition6Programp1isfail-safefault-tolerant,withperfectdetection,
andminimaldetectionlatencytoF1.

Proof.TheproofisbasedonLemmas5,6and7,whichensurethatp1
satisfiesitssafetyspecificationinpresenceofF1,andhaveperfectdetection,
andminimallatencytoF1.
Wenowneedtoshowthatthisconstructionofp1satisfiesthenon-
interferencepropertiesdefinedinSection6.2.
First,weneedtoprovethat“IntheabsenceofF1,thedetectorcompo-
nentsaddedtopdonotinterferewithp,i.e.,eachcomputationofpisinthe

310

problemspecificationevenifitexecutesconcurrentlywiththenewdetector
”.onentsompcf.oPro1.FromLemma7,p1andphavethesamebehaviorintheabsenceof
faults.2.Fromstep1,eachcomputationofpisintheproblemspecificationeven
ifpexecutesconcurrentlywiththedetectorcomponentsforfail-safe
faulttolerancetoF1.
3.Fromstep3,thedetectorcomponentsforfail-safefaulttolerancetoF1
donotinterferewithp.

Secondly,weneedtoprovethat“InthepresenceoffaultsF1,eachcom-
putationofthedetectorcomponentsisinthecomponents’specificationeven
iftheyexecuteconcurrentlywithp,i.e.,pdoesnotinterferewiththenew
detectorcomponents.”.
f.oPro1.FromProp6,p1isfail-safeF1-tolerant.
2.Fromstep1,eachcomputationofthedetectorcomponentsforfail-
safefaulttolerancetoF1isintheirspecificationeveniftheyexecute
concurrentlywithp
3.Fromstep3,pdoesnotinterferewiththedetectorcomponentsfor
fail-safeF1-tolerance.

Infact,thesetwonon-interferenceconditionsareguaranteedbyconstruc-
tionofp1sincethesynthesismethodisidenticaltoalgorithmadd-efficient-
fail-safe(p,F1,ss).Itisalsoguaranteedthatp1hasperfectdetection,and
minimumdetectionlatencytoF1.

410

Step2inDesignofEfficientMultitolerance

Next,faultclassF2isconsiderd,andprogramp1(whichisfail-safefault-
toleranttoF1)istransformedintoaprogramp2thatisfail-safefault-tolerant
toF2,whilepreservingthefail-safefaulttolerancetoF1,i.e.,program
p2isfail-safeF1,F2-tolerant.Toachievethis,theseteit2ofearliestSS-
inconsistenttransitionsforpinpresenceoffaultsF2iscomputed,andthese
transitionsareremovedfromprogramp1toobtainprogramp2.Program
p2,designedassuch,isfail-safefault-tolerant,withperfectdetection,and
minimaldetectionlatencytobothF1,andF2.Thedesignofp2isshownin
0.6.1Fig

eit2:=get-eit(p,F2,ss)
p2:=p1\eit2

Figure6.10:Thesecondstepinthedesignofmultitolerantprogramswith
perfectdetectionandminimallatency

Proposition7(Fail-safefaulttoleranceofp2toF2)Givenafault-
intolerantprogrampwithsafetyspecificationSS,twofaultclassesF1and
F2,andaprogramp1whichisfail-safefault-toleranttoF1i.e.,fail-safe
F1-tolerant.Then,p2=p1\get-eit(p,F2,ss)(i)isfail-safefault-tolerant
toF2,(ii)hasperfectdetectioninpresenceofF2,and(iii)hasminimal
detectionlatencyinpresenceofF2.
Observethattheseteit2isthesetofearliestSS-inconsistenttransitions
forpinpresenceofF2,buttoobtainprogramp2,theseteit2needstobe
removedfromprogramp1.
Toprovethecorrectnessofsuchastep,weneedtoprovethefollowing:
1.Programp2isfail-safefault-toleranttoF2withperfectdetection,and
minimaldetectionlatency,i.e.,weproveProposition7.
2.Weneedtofail-safefault-tolerancetoF1inpresenceofF1ispreserved,
asexplainedinSection6.2.

510

3.Weneedtoprovethattheperfectdetection,andminimaldetection
latencyofp1toF1ispreserved
First,weprovethatp2isfail-safefault-toleranttoF2.
f.oPro1.Given:p2=p1\get-eit(p,F2,ss)
2.Therearetwocasestoconsider:
(i)∃τ∈get-eit(p,F2,ss)andτ∈δp1
(ii)∃τ∈get-eit(p,F2,ss)andτ∈δp1
3.(i)Fromstep3(i),τ∈δp2sinceτ∈get-eit(p,F2,ss)andp2=p1\get-
eit(p,F2,ss)
(ii)Fromstep3(ii),τ∈δp2sinceτ∈δp1,andp2=p1\get-eit(p,F2,ss)
4.Fromstep3,∀τ∈get-eit(p,F2,ss),τ∈δp2.
5.Fromstep4andfromDefs.37and44,andconstructionofget-eit,bad
transitionsinssreachableinpresenceofF2cannolongerbereached.
6.Fromstep5,p2isfail-safefault-toleranttoF2.
WenowprovethesecondpartofProposition7:p2hasperfectdetection
.Fto2f.oPro1.Given:p2=p1\get-eit(p,F2,ss)
2.FromProp.7(i),p2isfail-safefault-toleranttoF2
3.Fromstep3,nocomputationofp2inpresenceofF2willviolateSS.
4.Fromstep3,thenewdetectorcomponentsforfail-safefaulttolerance
toF2arecomplete.
5.Fromstep3andDef.44,allτ∈get-eit(p,F2,ss)areSS-inconsistent
pfor

610

6.Fromstep5,thenewdetectorcomponentsforfail-safefaulttolerance
toF2areaccurate.
7.Fromsteps4,and6,thenewdetectorcomponentsforfail-safefault
tolerancetoF2areperfect.

Wenowprovethethirdpartof7:p2hasminimaldetectionlatencyto
.F2f.oPro1.Given:p2=p1\get-eit(p,F2,ss)
2.FromProp.7(i),∀τ∈get-eit(p,F2,ss),τ∈δp2
3.Fromstep3,p2hasdetectionlatency0,i.e.,minimaldetectionlatency,
.Fto2Wehaveprovedthatthiswayofdesigningfail-safefaulttolerancetoF2
iscorrect,andthatp2hasperfectdetection,andminimaldetectionlatency
.Fto2However,wehaveyettoshowthattheconstructionpreservesthefail-safe
fault-tolerancetoF1,i.e.,weneedtoverifythattherearenointerference.
Toachievethis,wefirstneedtoprovethat“IntheabsenceofF1andF2,
thedetectorcomponentsaddedtop1forfail-safefaulttolerancetoF2donot
interferewithp1,i.e.,eachcomputationofp1isintheproblemspecification
evenifitexecutesconcurrentlywiththenewdetectorcomponents”.
f.oPro1.FromProp.7(ii),p2hasperfectdetectiontoF2.
2.Fromstep1,thenewdetectorcomponentsforfail-safefaulttolerance
toF2areperfectinp1.
3.Fromstep3andLemma.1,thenewdetectorcomponentsforfail-safe
faulttolerancetoF2donotinterferewithp1.

710

Wenowprovethesecondpartofthenon-interferenceconditions,whichis
“InthepresenceofF1,thenewdetectorsforfail-safefaulttolerancetoF2do
notinterferewiththefail-safeF1-toleranceofp1,i.e.,everycomputationof
p1isinthefail-safeF1-tolerancespecificationevenifp1executesconcurrently
withthenewcomponents.”
Proof.Weprovethisbycontradiction.Wefirstassumethatthereexists
acomputationinpresenceofF1thatviolatessafety,andshowthatsucha
computationcannotexist,i.e.,acontradiction.
1.Givenp2=p1\get-eit(p,F2,ss)
2.AssumethatthereisacomputationαinpresenceofF1thatviolates
ysafet3.Fromstep3andProposition.2,αcontainsabadtransitionτ.
4.Fromstep3andbyconstructionofp1,τisnotreachableinpresence
.Fof15.Fromstep4,andbyconstructionofp2,nonewtransitionisintroduced,
henceτisstillunreachable.
6.Fromsteps3,4and5,wehaveacontradiction.

Theproofofthethirdpart,whichis“InthepresenceofF2,p1doesnot
interferewiththenewdetectorsthatprovidefail-safefaulttolerancetoF2.”
Proof.ByProposition7(i),p1doesnotinterferewiththenewdetector
components.Wehave
provedthatthenewaddedcomponentstop1addsfail-safefaulttolerance
withperfectdetection,andminimaldetectionlatencytoF2,andthatthis
additionpreservesthefail-safefaulttolerancetoF1.Thus,wenowneedto
provethat,inpresenceofF1,theperfectdetection,andminimaldetection
latencyofp1toF1ispreserved.
Proof.WeprovethatperfectdetectiontoF1ispreservedinpresenceof
.F1

810

1.Given:p2=p1\get-eit(p,F2,ss)
2.Fromthefirstdesignstep,p1hasperfectdetectiontoF1
3.Byconstruction,thenewdetectorcomponentsforF2donotinterfere
withfail-safefaulttoleranceofp1toF1,hencecompletenessofdetector
componentsforF1ispreserved.
4.FromDef.38,everytransitionsτ∈get-eit(p,F2,ss)isSS-inconsistent
.pfor5.Fromstep4,andbyconstruction,noSS-consistenttransitionisre-
moved,henceaccuracyofthedetectorcomponentsforF1ispreserved.
6.Fromsteps3,and5,perfectdetectionispreserved.

Proof.WenowprovethatminimaldetectionlatencytoF1ispreserved
inpresenceofF1.
1.Given:p2=p1\get-eit(p,F2,ss)
2.Fromstep1,notransitionisadded.
3.Fromstep3,seteit1isstill“removed”
4.Fromstep3,minimaldetectionlatencytoF1ispreserved.

Wehave,atthispoint,provedthecorrectnessofthetransformationstep
ofprogramp1intoprogramp2.Theprocedureofaddingfail-safemultitoler-
ancecanbeeasilygeneralizedfornfaultclasses.

910

Thealgorithmtodesignfail-safemultitolerancetonfaultclasses,with
perfectdetection,andminimaldetectionlatencyisshowninFig.6.11.

add-efficient-fail-safe-multitolerance(p,[F1...Fn],ss:setoftransi-
:)nstio

{i:=1;p0:=p
while(i≤n)do{
eiti:=get-eit(p,Fi,ss);
pi:=pi−1\eiti;
i:=i+1;}od
})preturn(n

Figure6.11:Algorithmadd-efficient-fail-safe-multitoleranceaddsfail-safe
faulttolerancetonfaultclasses,withperfectdetection,andminimalde-
tectionlatencytoeveryfaultclass

Theorem15(SynthesisofEfficientMultitolerance)Givenafault-
intolerantprogrampwithsafetyspecificationSS,andnfaultclasses
F1...Fn.Algorithmadd-efficient-fail-safe-multitoleranceaddsfail-safefault
tolerancetoF1...Fntop,withperfectdetection,andminimaldetectionla-
tencytoallfaultclasses.

6.3.5ASimpleExample
One-at-a-TimeAdditionofFail-SafeFaultTolerancewithPerfect
DetectionandMinimalDetectionLatency

Inthissection,wewillpresentasmallexampletoillustratetheworkings
ofalgorithmadd-efficient-fail-safe-multitolerance.Forcontinuity,wereuse
thesameexampleasbefore.Thefault-intolerantprogramisidenticaltothe
programofFig6.5,andisdepictedinthepresenceofF1.Recallthat,for
thisprogram,transitions(10,11)and(20,21)arebadtransitions.
Duringthefirstiterationthroughalgorithmadd-efficient-fail-safe-
multitolerance,thecalltoget-eitcausestransitions(7,8),(18,19)and(19,20)

110

tobetaggedasearliestSS-inconsistenttransitions.Thesetransitionsare
thenremovedfromthefault-intolerantprogram.Theresultingfail-safefault-
tolerantprogramp1hasperfectdetection,andminimaldetectionlatencyto
F1.Programp1inpresenceofF1isshowninFig.6.12.

Figure6.12:Resultingfail-safefault-tolerantprogramwithperfectdetection,
andminimaldetectionlatencytoF1

Then,weconsiderprogramp1inpresenceofF2,asshowninFig.6.13

Figure6.13:Programp1inpresenceofF2

Intheseconditerationthroughadd-efficient-fail-safe-multitolerance,we
needtoaddfail-safefaulttolerancewithperfectdetection,andminimal
detectionlatencytoF2top1,whilepreservingthefail-safefaulttolerance
withperfectdetection,andminimaldetectionlatencyofp1toF1.First,
111

weconsiderp1inpresenceofF2,asshowninFig.6.13.Thecalltoadd-
efficient-fail-safe-multitolerancecausestransitions(9,10),(17,18),(20,21)to
beconsideredasearliestSS-inconsistenttransitions.
Observethatthecalltoadd-efficient-fail-safe-multitoleranceinthesec-
onditerationreferstothefault-intolerantprogramp,insteadofp1.This
issobecausehadthecallreferredtop1,transition(17,18)wouldnothave
beenincluded.Giventhattransition(19,20)hasbeenremovedinthefirst
iteration,ifp1isreferredto,thennopathfromafaulttransitionofF2to
abadtransition,usingonlyonlyprogramtransitions,wouldbeobserved,
i.e.nopathfromtransition(17,18)tobadtransitionusingonlyprogram
transitionswouldbeobserved.So,transition(17,18)wouldnothavebeen
includedasanearliestSS-inconsistenttransition.
Theresultingfail-safemultitolerantprogramp2tofaultclassesF1and
F2withperfectdetection,andminimaldetectionlatencytobothisshown
.6.14Fig.in

Figure6.14:Resultingfail-safefault-tolerantprogramp2inpresenceofF2

Inthissection,wehaveconsideredtheapproachwherefaultclassesare
consideredinsomefixedtotalorder,andpresentedtwoalgorithmsthatau-
tomatestheadditionofmultitolerance.Anotherpossibledesignapproach
formultitoleranceconsidersallfaultclassesatthesametime.
Inthenextsection,wepresenttwoalgorithmsthataddmultitolerance,
whileconsideringallfaultclassesatthesametime.

211

6.4All-at-a-timeDesignofMultitolerance

Intheprevioussection(Section6.3),wepresentedtwoalgorithmsthatsyn-
thesizefail-safemultitolerantprogramstonfaultclasses,byconsideringone
faultclassatatime.Thefirstalgorithmensuresthattheresultingfail-safe
multitolerantprogramhasperfectdetectiontoallfaultclasses,whilethesec-
ondalgorithmensuresthatthemultitolerantprogramhasperfectdetection,
andminimaldetectionlatencytoallfaultclasses.
Inthissection,weconsideranotherdesignapproachwhereallthefault
classesareconsideredatthesametime,andwepresenttwoalgorithmsbased
onthisdesignapproachthatachievethesamegoalsasthealgorithmsof
Section6.3.Thefirstalgorithmyieldsfail-safemultitolerantprogramsto
nfaultclassesF1...Fnwithperfectdetection,byconsideringallthefault
classesatthesametime,whilethesecondalgorithm,thatagainhandlesall
faultclassesatthesametime,yieldsfail-safemultitolerantprogramstofault
classesF1...Fnwithperfectdetection,andminimallatency.Wealsoshow
thatfail-safemultitolerantprogramsobtainedfromcorrespondingalgorithms
ofeitherdesignapproachareidentical.Wefurtherexploitthisrelationto
provepropertiesofthefail-safemultitolerantprogramsobtainedusingthe
algorithmspresentedinthissection.
Designofmultitolerancewhileconsideringallfaultclassesatthesame
timestillrequirestheverificationofnon-interferencebetweenthedifferent
programcomponents.Sinceallfaultclassesareconsideredatthesametime,
addingfaulttolerancetoonefaultclassentailsverificationthatthenew
detectorcomponentsdonotinterferewiththefail-safefaulttolerancetoall
otherfaultclasses.Thisproblemistackledbyshowingthatthefail-safe
multitolerantprogramobtainedusingtheall-at-atimealgorithmisidentical
tothefail-safemultitolerantprogramobtainedbyusingthecorresponding
.orithmalgone-at-a-time

6.4.1MultitolerancewithPerfectDetection
Inthissection,wepresentanalgorithmthataddsfail-safemultitolerance
withperfectdetectiontofaultclassesF1...Fnbyconsideringfaultclasses

311

time.sametheatThealgorithm,add-perfect-fail-safe-multitolerance-all,isshownin
5.6.1Fig.add-perfect-fail-safe-multitolerance-all(p,[F1...Fn],ss:setoftran-
sitions):

egincob{||in=1ssri:=get-ssr(p,Fi,ss);coend
nssr:=i=1ssri
return(pn:=p\ssr)}

Figure6.15:Algorithmadd-perfect-fail-safe-multitolerance-alladdsfail-safe
faulttolerancetonfaultclasses,withperfectdetectiontoeveryfaultclass
byconsideringallfaultclassesatthesametime.

Theorem16Givenafault-intolerantprogrampwithsafetyspecifica-
tionSS,andnfaultclassesF1...Fn.Algorithmadd-perfect-fail-safe-
multitolerance-alladdsfail-safefaulttolerancetoF1...Fn,withperfectde-
tectiontoallfaultclasses,whileconsideringallfaultclassesatthesame
time.

Toprovethis,wemakethefollowingobservation.

Proposition8Givenafault-intolerantprogrampwithsafetyspecifica-
tionSS,andnfaultclassesF1...Fn.Giventwoprogramspn:==add-
perfect-fail-safe-multitolerance(p,[F1...Fn]),ss,andpn:=add-perfect-fail-
safe-multitolerance-all(p,[F1...Fn],ss).Then,pn=pn.

Toprovetheaboveproposition,weneedtoshowthefollowing:

1.Everytransitionremovedinpnisalsoremovedinpn.
2.Everytransitionremovedinpnisalsoremovedinpn.

Proof.Weconsideranygiventransitionτthatisremovedinpn.

411

1.τisremovedinpn
2.Fromstep1,∃i:1≤i≤n:τ∈get-ssr(pi−1,Fi,ss)
3.Fromstep3∃acomputationαofpi−1inpresenceofFis.tτoccursin
α4.Fromstep3αisalsoacomputationofpinpresenceofFiandτoccurs
αin5.Fromstep4,andbyconstructionofpn,τisremovedinpn
Wenowprovethesecondpart:
Proof.Weprovethisbycontradiction,i.e.,weassumethereisatransition
τthatisremovedinpnbutnotinpn,andshowacontradiction.
1.∃i:1≤i≤n:τ∈get-ssr(p,Fi,ss)
2.Fromstep1,thereexistsacomptutationαofpinpresenceofFis.tτ
.αinccurso3.Sinceτ∈δpn,∀i:1≤i≤n,τisnotreachablebypi−1inpresenceof
Fi4.Byconstructionofpi−1,i>1,eitherτisalsounreachablebypin
presenceofFi,orτisalreadyremovedfrompn
5.Fromassumption,steps3,and4,wehaveacontradiction.

Thus,wehaveprovedthatalgorithmsadd-perfect-fail-safe-multitolerance
andadd-perfect-fail-safe-multitolerance-allyieldidenticalfail-safemultitol-
erantprogramswithperfectdetection.Wepresentasimpleexampleinthe
nextsectiontoillustratetheworkingofthealgorithm.

511

6.4.2ASimpleExample
All-at-a-TimeAdditionofFail-SafeFaultTolerancewithPerfect
onectitDe

Again,weusethesameexampleasbeforetoillustratetheworkingof
algorithmadd-perfect-fail-safe-multitolerance-all.Thefault-intolerantpro-
graminpresenceofF1andF2isshowninFigs.6.16and6.17respectively.

Figure6.16:Fault-intolerantprograminpresenceofF1

Figure6.17:Fault-intolerantprograminpresenceofF2

InthepresenceofF1,badtransitions(10,11),(20,21)arereachable,while
inthepresenceofF2,thesamesetofbadtransitionsisreachable(callto
get-ssr).Thesetransitionsarethenremovedfromtheprogramtoyielda
fail-safefault-tolerantprogramwithperfectdetectiontobothF1andF2,as
showninFig.6.18.

611

Figure6.18:Resultingfail-safemultitolerantprogramp2toF1andF2with
perfectdetectiontobothfaultclasses.

ObservethattheresultingprograminFig.6.18isidenticaltotheprogram
showninFig.6.8(obtainedusingtheapproachthatconsidersfaultclasses
time).aatoneInthenextsection,wepresentanalgorithmthataddsfail-safefault
tolerancetonfaultclassesF1...Fn,withperfectdetection,andminimal
detectionlatencytoallfaultclasses,whileconsideringallfaultclassesatthe
time.same

6.4.3MultitolerancewithPerfectDetectionandmin-
imaldetectionlatency

Inthissection,wepresentanalgorithmthataddsfail-safemultitolerance
tofaultclassesF1...Fnwithperfectfaultdetection,andminimaldetection
latency,whileconsideringallfaultclassesatthesametime.
Thealgorithm,add-efficient-fail-safe-multitolerance-all,isshownin
9.6.1Fig.

Theorem17Givenafault-intolerantprogrampwithsafetyspecifica-
tionSS,andnfaultclassesF1...Fn.Algorithmadd-efficient-fail-safe-
multitolerance-alladdsfail-safefaulttolerancetoF1...Fn,withperfectde-
tection,andminimumdetectionlatencytoallfaultclasses,whileconsidering
allfaultclassesatthesametime.

711

add-efficient-fail-safe-multitolerance-all(p,[F1...Fn],ss:setof
nsitions):tra

egincob{||in=1eiti:=get-eit(p,Fi,ss);coend
eit:=in=1eiti
return(pn:=p\eit)}

Figure6.19:Algorithmadd-efficient-fail-safe-multitolerance-alladdsfail-safe
faulttolerancetonfaultclasses,withperfectdetection,andminimaldetec-
tionlatencytoeveryfaultclassbyconsideringallfaultclassesatthesame
time.

Toprovethis,wemakethefollowingobservation.

Proposition9Givenafault-intolerantprogrampwithsafetyspecification
SS,andnfaultclassesF1...Fn.Giventwoprogramspn:==add-efficient-
fail-safe-multitolerance(p,[F1...Fn]),ss,andpn:=add-efficient-fail-safe-
multitolerance-all(p,[F1...Fn],ss).Then,pn=pn.

Toprovetheaboveproposition,weneedtoshowthefollowing:

1.Everytransitionremovedinpnisalsoremovedinpn.

2.Everytransitionremovedinpnisalsoremovedinpn.

Proof.Theproofistrivial,byconstruction.

1.Inbothconstructions,theseteitiiscomputedinthesameway,and
ed.vremothen

2.Fromstep1,everytransitionremovedinpnisalsoremovedinpn

3.Fromstep1,everytransitionremovedinpnisalsoremovedinpn

811

6.4.4ASimpleExample
One-at-a-TimeAdditionofFail-SafeFaultTolerancewithPerfect
DetectionandMinimalDetectionLatency

Asbefore,wereusethesameexampletoillustratetheworkingofalgo-
rithmadd-efficient-fail-safe-multitolerance-all.Thefault-intolerantprogram
inpresenceofF1andF2isshowninFigs.6.20and6.21respectively.

Figure6.20:Fault-intolerantprograminpresenceofF1

Figure6.21:Fault-intolerantprograminpresenceofF2

InthepresenceofF1,transitions(7,8),(18,19),(1,20)consideredear-
liestSS-inconsistenttransitions,whileinthepresenceofF2,transitions
(9,10),(17,18),(20,21)areearliestSS-inconsistenttransitions.Thesetran-
sitionsarethenremovedfromtheprogramtoyieldafail-safefault-tolerant
programwithperfectdetectiontobothF1andF2,asshowninFig.6.22.

119

Figure6.22:Resultingfail-safemultitolerantprogramp2toF1andF2with
perfectdetectionandminimaldetectionlatencytobothfaultclasseswhen
consideringallfaultclassesatthesametime.

ObservethattheresultingprograminFig.6.22isidenticaltotheprogram
showninFig.6.14(obtainedusingtheapproachthatconsidersfaultclasses
time).aatone

6.5ChapterSummary

Inthischapter,wehavepresentedfourdifferentalgorithmsthatyieldsfail-
safemultitolerantprograms,withvariousefficiencyproperties,suchasper-
fectdetection,andminimaldetectionlatencyforallfaultclasses,usingdif-
ferentdesignapproaches.
Wehaveconsideredtwopossibleapproachesforthedesignofmultitoler-
ance,namely(i)onethatconsidersonefaultclassatatime,and(ii)another
thatconsidersallfaultclassesatthesametime.Wefirstconsideredthe
approachwhichaddsmultitolerancebyconsideringonefaultclassatatime,
andwepresentedtwoalgorithms,namelyadd-perfect-fail-safe-multitolerance,
andadd-efficient-fail-safe-multitolerance,whichaddfail-safemultitolerance
toapreviouslyfault-intolerantprogram,withvariousoptimalproperties.We
explainedthat,duringtheadditionofmultitolerance,somenon-interference
conditionsbetweendifferentprogramcomponentsneedtobeverified.How-
ever,weextendedtheproofobligationstoincludenon-interferencewiththe

012

efficiencypropertiesoftheprogram.Whenfail-safemultitolerancewithper-
fectdetectionisadded,thenon-interferenceconditionsaresimilartothose
proposedbyAroraandKulkarniin[AK98a],andareguaranteedbytheuseof
algorithmadd-perfect-fail-safe-multitolerance.Forprovisionoffail-safemul-
titolerancewithperfectdetection,andminimaldetectionlatency,weverified
non-interferencebetweenvariousprogramcomponents,aswellasverified
thatthoseefficiencypropertiesarenotcompromised.
Wethenconsideredtheapproachwhereallthefaultclassesarehandled
atthesametime.Weprovidedtwoalgorithms,namelyadd-perfect-fail-safe-
multitolerance-all,andadd-efficient-fail-safe-multitolerance-allthataddfail-
safemultitolerancewithperfectdetection,andperfectdetectionandminimal
detectionlatencyrespectivelytoaninitiallyfault-intolerantprogram.We
showthatthecorrespondingfail-safemultitolerantprogramsareidenticalto
thoseobtainedusingtheone-at-a-timedesignapproach.Thismeansthat
allnon-interferenceconditionsaresatisfied,aswellasoptimalproperties
ed.preservThealgorithmsbasedontheone-at-a-timeapproachcanbeusedtoadd
faulttolerancetonewfaultclasses.Specifically,assumeafail-safefault-
tolerantprogramFntonfaultclassesF1...Fn.Iffail-safefaulttoleranceto
faultclassFn+1needstobeadded,thosealgorithmscanbeused,without
havingtorecomputethefail-safefaulttolerancetoalltheotherfaultclasses.
Ontheotherhand,thealgorithmsbasedontheall-at-a-timeapproachcanbe
usedwhenafaultclassisre-defined,orremoved.Also,thisalsomeansthat
forfail-safemultitolerance,efficiencypropertiessuchasperfectdetectioncan
bedesignedforallfaultclasses.

112

212

Chapter

7

nionclusCo

and

eurutF

orkW

Inthisthesis,wehavepresentedaframeworkforthedesignofefficientfail-

safefaulttolerance.Suchanapproachisboundtoraiseseveralquestions.

WefirstaddresssomeoftheissuesraisedinSection7.1.InSection7.2,we

summarizethecontributionsmadeinthisthesis,andwediscusstheirimpact

7.3.nctioSein

InSection7.4,weoutlinesomepossiblefutureavenues.

312

onscussiDi17.

Inthissection,weaddresssomeoftheissuesourapproachhasraised.

AroraandKulkarni[AK98c]alsogiveaformaldefinitionofde-
tectors.Isn’teverydetectoraccordingtoAroraandKulkarnia
perfectdetector?No.AroraandKulkarni[AK98c,AK98a]defineade-
tectortobeacomponentwhichrelatestwopredicateswitheachother:a
detectionpredicateX(describingthepresumed“bad”statelikethecrash
ofaprocess),andawitnesspredicateZ(indicatingthatthisstateholds).
Thesafenessconditionofthedetector[AK98a]mandatesthatZ⇒X,i.e.,
thewitnessisneverwrong,whiletheprogressandstabilityconditionsofthe
detector[AK98a]mandatesthatifXistrueforlongenough,Zwilleven-
tuallywitnessthisfactanditwilldothisuntilXisfalsifiedagain.Ifthe
detectionpredicatecanbeevaluatedatomicallybytheprocesses,thenthe
detectionpredicatecanbeequivalenttothewitnesspredicate.However,a
detectordoesnotnecessarilyguaranteethatthedetectionpredicatehasany
meaningfulconnectiontothecorrectnessspecification.Soevenifthewitness
predicateisequivalenttothedetectionpredicate,thereisnoguaranteethat
itdetectsbadorinconsistentstateswithrespecttoasafetyspecification.
However,AroraandKulkarni[AK98a]provethatforanysafetyspecifica-
tionandeveryactionthereexistsadetectionpredicatesuchthatexecuting
theactionwhenthepredicateholdsmaintainsthespecification.Theyalso
indicatethataweakestpredicatemayexist.However,theydonotexplain
howsuchpredicates(detectors)canbeobtained.Ourtheorygivesaguideline
howtofindthisweakestdetectionpredicated.Assumingthatd≡X≡Z,
everydetectorinthesenseofAroraandKulkarniisperfect.Further,inthe
senseofAroraandKulkarni,theweakestpredicateexistsforcriticalactions,
whileinourcase,wemakenosuchdistinction.

412

HowdoperfectdetectorsinourworkcomparetoChandraand
Toueg’sperfectfailuredetectors?Thereisacloserelationshipbetween
ourterminologyandthatofthefailuredetectortheoryofChandraandToueg
[CT96].Kulkarni[Kul99]arguesthatthesefailuredetectorscanberegarded
asaninstanceofdetectorsinthesenseofthispaper.Theaccuracyproperty
ofChandraandTouegalsolimitsthenumberofmistakesadetectorcan
make.ThecompletenesspropertyofChandraandTouegalsoreferstothe
abilityofadetectortodetectallfaults.

Inthispaper,wehaveassumedboundedprograms,i.e.,programs
withfinitestatespace.Whatistheimpactiftheprogramisun-
bounded?Ifunboundedprogramsareconsidered,themethodhastodeal
withaninfinitestatespaceandintheworstcaselosescompleteness,i.e.,it
maynotterminate.However,themethodremainssound.Thisisanalogous
tothesituationintheareaofmodelcheckingwherethefailuretoinvali-
datethespecificationonanyfinitesubsetofthestatespacesaysnothing
aboutsatisfactionofthespecificationonaninfinitestatespace.Because
ourmethodistransition-based,unfortunately,itdoesnotallowtoreason
directlyonthelevelofguardedcommandprograms,whichcanberegarded
asafiniterepresentationofaninfinitetransitionsystem.

Inthispaper,weprovidedanalgorithmforautomatingdesign
offail-safefaulttolerance.Isitefficient?Canthetheorybe
usedasastand-alone?Therearetwomaincontributionsofthepresent
work:Firstly,thetransformationalgorithmautomatestheadditionoffault-
toleranceandisefficientinthesensethatithaspolynomialtimecomplexity
inthe“size”ofthespecification(thenumberofbadtransitions)andthe
“size”oftheprogram(thenumberofreachabletransitions).Iftheprogram
isgivenasaguardedcommandprogram,itmustfirstbetranslatedintoa
statemachine.
Secondly,weprovidedatheorywhichallowedthederivationofthetrans-
formationalgorithmandisusedtoproveitscorrectness.Thetheorycan
beregardedasarefinementofthedetectortheoryofAroraandKulkarni

512

[AK98c]andbetterexplainstheworkingprinciplesofdetectors,e.g.,allows
anaturalwaytoformulateandexplainaccuracyandcompletenessproper-
ties.Forexample,Levesonetal.[LCKS90]observedthattheefficiencyof
adetectorisdependentonitslocation,i.e.,whichactionthedetectormoni-
tors.Ourtheorycontributestothisbyprovingthatitissufficienttomonitor
criticalactionswithperfectdetectorsforfail-safefaulttolerance.Thissaves
theprogrammerofhavingtotrydifferentdetectorsatdifferentlocationsto
addfail-safefaulttolerance.

KulkarniandArora[KA00]presentedanalgorithmwhichalso
solvesthetransformationproblemdefinedinSection4.3.Isn’t
thisalgorithmthesameasthealgorithmpresentedinthispaper?
No.ThealgorithmbyKulkarniandArora[KA00]alsoworksonthestate-
transitionrepresentationofthepogrambutdoesmoreworkthanabsolutely
necessary:byaddingdetectors,italsoremovesnon-reachabletransitions
fromthetransitionrelation.Sowhiletheeffectofthetransformationisthe
same,theformoftheaddeddetectorsisdifferent.Theabilitytoformulate
thisdifferenceisoneofthecontributionsofourtheory.

Inthispaper,wemadeuseofbadtransitions.Howarethose
transitionsobtained?Isthegenerationprocesscomputationally
expensive?Ifthesafetyspecificationisgivenasastateinvariant,i.e.,a
predicateφonsystemstates(withoutusinghistoryvariables),thenitisrela-
tivelyeasytocomputethesetofbadtransitions.Forthis,itisjustnecessary
toinspectallpossibletransitions(s,s)andcheckwhetherssatisfiesφands
satisfies¬φ.Thisisfeasibleifthesetoftransitionsisbounded.Inpractice,
mostsafetyspecificationsarestateinvariants.
Noteverysafetyspecificationcanberepresentedasapredicateonsystem
states,evenifitisfusion-closed.Asanexample,considerasystemconsisting
ofthreestatess1,s2,s3andthecorrectnessspecificationSS={s1∙s2∙s3}.
Thereisno“badstate”inthisprogram,butthereisabadtransition(s1,s3).
Wearenotawareofanymethodtoefficientlycalculatethesetransitions
fromanabstractrepresentationofthespecification(e.g.,atemporallogic

612

ula).form

InthedefinitionofSS-inconsistency,thereexistsasequenceofpro-
gramtransitionsaftertheoccurrenceoffaultsthateventuallylead
toviolationofsafety.Whatistheimpactofsucharequirement?
InthedefinitionofSS-inconsistency,werequirethatthereexistsasequence
ofprogramtransitionsthateventuallyleadtoviolationofsafety.Thereason
behindthisrequirementisbasedonthefactthatfaultcannotdirectlyvio-
latesafety,whichcanthenonlybeviolatedbya(bad)programtransition.
Thus,whensafetyisviolated,atleastoneprogramtransitionisexecuted
(whichisthebadtransitionitself).However,dependingonthefaultmodel,
therecanthenbeasequenceofprogramtransitionsthatultimatelyleads
tothebadtransitionbeingexecuted.Also,thereasonforconsideringonly
whenthereexistsasequenceofprogramtransitionsthatultimatelyleadto
safetyviolationisthatonecanprevent(bad)programtransitionsfromoc-
curring,howeverthisisnotpossibleforfaulttransitions.Theimpactofsuch
arequirementisthatitallowsthedefinitionoftheearliestSS-inconsistent
transitionthatunderpinsfastdetection.Ifsucharequirementis“removed”,
thenassumingthatafaulttransitionisan“earliestinconsistenttransition”,
onecannotpreventitfromoccuring.

Canthealgorithmadd-perfect-fail-safebeusedtosynthesizefail-
safefault-tolerantprogramswithperfectdetection,andoptimal
detectionlatency?Wehaveshownthroughexampleshowtheuseofal-
gorithmsadd-perfect-fail-safe,andadd-efficient-fail-safeyieldthesamere-
sultsfordistributedalgorithms.However,forotherclassesofprograms,the
resultswillbedifferent.
But,thereisasenseinwhichalgorithmadd-perfect-fail-safeisequivalent
toalgorithmadd-efficient-fail-safe.SinceallSS-inconsistenttransitionscan
possiblyleadtosafetyviolation,thenifwetreateachsuchSS-inconsistent
transitionasbad,thenthesetssofbadtransitionsisextendedtoinclude
thesetoftransitionsthatareSS-inconsistentforp.Then,sinceallearliest
inconsistenttransitionsareSS-inconsistent,theyarealsoincludedinsetss.

712

Runningalgorithmadd-perfect-fail-safethusremovestheearliestinconsistent
transitions,whichthengivesthesameresultasthatwhenrunningalgorithm
add-efficient-fail-safe.However,theseteitofearliestinconsistenttransi-
tionsstillneedstobedetermined.Thatis,algorithmadd-efficient-fail-safe
cancalladd-perfect-fail-safeforgenerationofefficientfail-safefault-tolerant
.msgrapro

Isourassumptionofsuchafaultmodelasassumedinthisthesis
valid?Whatittheimpactofchoosingafaultmodelwherefaults
candirectlyviolatesafety?Inthisthesis,wehaveassumedfaultmodels
thatcanbetolerated.Specifically,wehavediscardedfaultmodelswhere
faultscandirectlyleadtoviolationofsafety.Wecananalyzetheimpactof
suchanassumptionfortwogeneralcases.
Inthecaseofdistributedalgorithms,itisseldomthecasethatfaults
canleaddirectlytoviolationofsafety.Toseethis,consider,forexample,
amutualexclusionprotocol.Whenoneprocessisexecutinginitscritical
section,evenifafaulthappens,thefaultcannotjustcauseanotherprocessto
startaccessingitscriticalsection.Thefaultcanhowevercausetheprocess
toenteritscriticalsection,by“enabling”atransitionwhichwouldhave
otherwisebeendisabled.KulkarniandEbnenasirtermedsuchspecifications
asfault-safespecifications[KE02].
Forthecaseofembeddedapplications,suchafaultmodelisstillvalid.
Forexample,theoutputregistercanbereplicatedinsuchawaythatthe
probabilityofmorethanamajorityofregistersbeingcorruptedisalways0.
However,thisensuresthatsafetyisneverviolatedbyfaults.
Ifweallowfaultstodirectlyviolatesafety,thenouralgorithmscanbe
extendedtodealwithsuchacase.Whendesigningfail-safefaulttolerance,
weneedtotakestepstopreventtheprogramfromreachingthosestates
fromwherefaultscandirectlyviolatesafety.Ifsuchfaultscanoccurfrom
anystate,thenweneedtopreventtheprogramfromreachinganystate,i.e.,
thereisnofail-safefault-tolerantprogram.

128

7.2SummaryofResearchContributions

Inthissection,wepresentbriefsummariesofthemaincontributionsmade
inthisthesis.Theaimwastodevelopaframeworkthatcanallowsystematic
developmentofefficient(fail-safe)fault-tolerantprograms,whereefficiency
wascharacterizedbysuchcommonly-usedmetricsasdetectioncoverageand
.ncyelatdetection

7.2.1PerfectDetection
InChapter4,wedevelopedatheoryofdetectors,andidentifiedaclass
ofdetectors,calledperfectdetectors,thatarecrucialinthedesignoffail-
safefault-tolerantprograms.Thetheoryisbelievedtocapturetheworking
principlesofdetectorsbetterthanbefore.Weshowed,amongothers,that
composingcriticalactionsofaprogramwithperfectdetectorsensuresfail-
safefaulttoleranceinpresenceoffaults,i.e.,composingcriticalactionsofa
programwithperfectdetectorsissufficienttoensurefail-safefaulttolerance.
Wealsoshowedthat,intheabsenceoffaults,livenessisnotcompromised.
Inpracticalterms,thismeansthat,wheneveranerrorisflagged,thereisa
“harmful”errorinthesystem,i.e.,itisnotafalsealarm.Wehavepresented
examplestoshowtheviabilityofourapproach.
AsindicatedbyLevesonet.alin[LCKS90],thedesignof“effective”detec-
torsisproblematic,andtheeffectivenessisheavilyreliantontheexperience
ofthesoftwaredesigners/programmers.Thoughtheauthorsof[LCKS90]
didnotexplicitlyindicatewhattheymeantby“effectiveness”ofdetectors,
wehaveshownthat“effectiveness”iscapturedbythecompleteness,andac-
curacypropertiesofdetectors.Tolessentheimpactofsuchrequirementsas
experienceofprogrammersonthedesignofeffectivedetectors,weprovided
analgorithmthatyieldsafail-safefault-tolerantprogramwithperfectdetec-
tion,bycomposingthecriticalactionsofthecorrespondingfault-intolerant
programwithperfectdetectors.Thisisachievedbyremovingthosebad
transitionsthatarereachableinpresenceoffaults.
Ingeneral,tovalidatethefaulttolerancemechanismsincorporatedina
program,faultinjectionexperiments[IT96,AAA+90]areusuallyconducted.

912

Inparticular,theyareusedtoquantifythecoverageofthemechanisms,
suchasin[Hil00],wherethecoverageistheratioofthenumberoffaults
detectedtothenumberoffaultsinjected.However,bydesign,thefail-safe
fault-tolerantprogramsobtainedfromthealgorithmhas“perfect”coverage,
sincethedetectorsareperfect.
Wehavealsoshownthattheautomaticsynthesisoffail-safefault-tolerant
programshaspolynomialtimecomplexityinthesizeofthestatespaceof
thefault-intolerantprogram.

7.2.2FastDetection
InChapter5,wedevelopedatheoryoffastdetectors,andidentifiedaclass
ofdetectors,calledfastdetectors,thatensuresminimaldetectionlatency,as
wellasperfectdetection.Theideabehindfastdetectionistopreventerrors
frompropagatingandcorruptingtheentirestateoftheprogram.However,
whendesigningfastdetectors,oneproblemcanbethatthesefastdetectors
arenotperfect.Wehavethereforeidentifiedtheclassoffastdetectorsthat
ensuresbothperfectdetection,andminimaldetectionlatency.
Asbefore,designofeffectivedetectorsisproblematic.Also,whendesign-
ingfault-tolerantsystems,whenfaultinjectionexperimentsareconducted
todeterminetheeffectivenessofthefaulttolerancemechanisms,detection
latencyofthesemechanismsisusuallyevaluated,andistakentobethemin-
imumtimebetweentheonset(injection)ofafaultanditsdetection.Fault
injectionexperimentscanbeacomputationallyexpensiveprocesstoevaluate
detectionlatency.Thus,totackletheproblemofdesigningperfectdetectors
whileensuringminimaldetectionlatency,weprovidedanalgorithmthat
achievesthat.Byconstruction,thedetectionlatencyofthefail-safefault-
tolerantprogramtothefaultclassis0(minimal).

7.2.3DesignofOne-at-a-timeMultitolerance
Buildinguponthedesignofperfectandfastdetectors,weaimedatgen-
eralizingtheresultstodealwithmultiplefaultclasses.InChapter6,we
addressedtheproblemofaddingefficientfail-safemultitolerance,i.e.,the

013

abilityofaprogramtotoleratemultipleclassesoffaults.Wearguedthat,in
adistributedenvironment,thenatureandtypesoffaultsaffectingasystem
isvaried,andthusthedesignoffault-tolerantsystemsneedstobecognizant
ofsuchdiversity.Therearetwopossiblewaysofdesigningmultitolerance,
andinChapter6,wepresentedalgorithmsthatbuilduponeachapproach.
Thefirstapproachdealswithadditionofmultitoleranceinastepwise
fashion,thatisaddingfail-safefaulttolerancetoagivenfaultclassoneata
time.Weexplainedthatduringtheadditionofmultitolerancetoaninitially
fault-intolerantprogram,theprogramisextendedwithdetectorcomponents
thathandlefaultsfromeachfaultclass.Consequently,therecanbeinterfer-
encebetweendifferentprogramcomponents,forexamplebetweendetector
componentsfordifferentfaultclasses,orbetweentheprogramandthedetec-
torcomponentsforsomefaultclass.Thisinterferenceneedstobehandled,
and“removed”,sinceitmaypreventthedifferentprogramcomponentsfrom
satisfyingtheirproblemspecification.Someofthenon-interferencecondi-
tionswerefirstpresentedbyAroraandKulkarniin[AK98a].However,given
ourfocusonefficientfail-safefaulttolerance,weexplainedthatduringad-
ditionofmultitolerance,thedetectorcomponentsfordifferentfaultclasses
shouldnotinterferewiththeoptimalproperties(perfectdetection,andmini-
maldetectionlatency)ofthefail-safemultitolerantprogramtodifferentfault
classes.Therefore,wehaveextendedthesetofnon-interferenceconditions
toincludethoserelatingtoprogramproperties.Therefore,anyalgorithm
thatautomaticallyaddsmultitolerancetoaprogramneedstoreflectthose
non-interferencerequirements.
Wedevelopedtwoalgorithmsthatautomaticallyaddmultitoleranceto
afault-intolerantprogram.Thefirstalgorithmtransformsafault-intolerant
programintoafail-safemultitolerantprogram,withperfectdetectiontoall
faultclassesconsidered.Thesecondalgorithmtransformsafault-intolerant
programintoafail-safefault-tolerantprogramwithperfectdetection,and
minimaldetectionlatencytoeveryfaultclassconsidered.Eachalgorithm
wasincrementallydeveloped,andeachdesignstagewasprovedtohandle
thenon-interferenceconditions(betweenprogramcomponents,andprogram
).iesertppro

113

7.2.4DesignofAll-at-a-timeMultitolerance

InChapter6,wealsofocusedonthesecondapproachfordesigningmulti-
tolerance.Inthisapproach,allthefaultclassesareconsideredatthesame
time.Wedevelopedtwoalgorithmsthataddfail-safemultitolerancetoa
fault-intolerantprogram,byconsideringallfaultclassesatthesametime.
Thefirstalgorithmyieldsafail-safemultitolerantprogramwithperfect
detectiontoallfaultclasses,whilethesecondalgorithmyieldsafail-safe
multitolerantprogramwithperfectdetection,andminimallatencytoallfault
classes.Wehaveshownthattheprogramsyieldedusingthesealgorithmsare
identicaltothoseyieldedbythecorrespondingalgorithmthatconsidersone
faultclassatatime.

pactIm37.

Wenowdiscussbrieflytheimpactourcontributionshaveondesignoffail-safe
faulttolerance.Ourtheoryprovidesageneralyetpowerfulbasisforunder-
standingtheworkingprinciplesofdetectors.Specifically,wehavebeenable,
throughourdefinednotionofperfectdetectors,toexplaindesigndecisions
inthedesignoffault-tolerantprograms.Forexample,onlycriticalactionsof
programswerecomposedwithdetectors.Howthesedetectorsweredesigned
orwhatpropertiesshouldtheypossessweremostlyintuitive,orbasedon
experience.Ourcontributionhasshownthat,forfail-safefaulttolerance,
thedetectorsneedtobeperfect,andthatitissufficienttocomposecritical
actionswithsuchdetectors.
Wehavealsoshownthat,inordertohaveminimaldetectionlatency,the
detectorsofnon-criticalactionsarenon-trivial,i.e.,theyarealsoperfect.By
wayofcontrast,AroraandKulkarniobservedin[AK98b]that,“accordingto
theirexperience”,detectorsofnon-criticalactionsaretrivial,i.e.,true.Thus,
weconcludethattherearetimeswhentheirapproachcanyieldminimalde-
tectionlatency,butmaybenotalways.AroraandKulkarnialsoobserved
in[AK98b]thatthedetectorsofcriticalactionsarenon-trivial,whileLeve-
sonet.alobservedin[LCKS90]thatdesignofeffectivedetectorsisdifficult.
Thoughtheauthorsneverexplicitlyclarifiedthemeaningof“non-trivial”,or

213

“effective”,ourtheoryhasenabledustodeterminethepropertiesthatun-
derpinthenotionsof“non-triviality”and“effectiveness”,i.e.,theproperties
ofcompleteness,andaccuracy.
Further,wehaveprovidedageneralizationofourapproachbylooking
atthedesignofmultitolerantprograms.Wehaveprovidedalgorithmsthat
canautomaticallyaddefficientfail-safemultitolerance,andthatforcertain
classesoffaulttolerance,andefficiencyproperties,theimpactoftheorder
inwhichfaultclassesarehandledcanbeminimized.

7.4FutureWork

Ourworkonautomatedsynthesisoffail-safefaulttolerancehasopenedup
severalnewavenuesforfutureresearch.Someofthemareoutlinedbelow.
Oneofthemainassumptionsunderpinningourworkhasbeenthatspec-
ificationarefusionclosed.Fusionclosureguaranteesthatthehistoryofthe
computationis“available”inthecurrentstateofthesystem,i.e.,byjust
lookingatthecurrentstate,onecandeterminewhetherthenextstepisa
badoneornot.Aspecificationthatisnotfusionclosedcanbemadefusion
closedbyaddinghistoryvariables.However,addinganothervariableleads
toanexponentialincreaseinsizeofthestatespaceoftheprogram.Thissug-
geststwopossibleavenues:(i)Isthereawayofconvertingnonfusionclosed
specificationsintoafusionclosedspecificationthatminimizesthenumberof
statesadded?.(ii)Doesthereexistaclassofnonfusionclosedspecifications
whichhavesufficiently“nice”featuressuchthattheabsencefusionclosure
isirrelevant?.
Inthisthesis,wehavelookedattwopropertiesoffail-safefault-tolerant
programs,namelyperfectdetection,anddetectionlatency.However,there
areotherpropertiesthatcanbeinvestigated.Onesuchpropertyisavail-
ability.Infact,inthecourseofthiswork,wehaveobservedthatoneneeds
toadoptapessimisticlookofacomputationinordertohavefastdetection.
However,foravailability,oneneedstoadoptamoreoptimisticoutlook.How
canavailabilitybemodeled,anditsimpactondesigndecisionswillbeinves-
ated.tig

313

413

Bibliography

[AAA+90]J.Arlat,M.Aguera,L.Amat,Y.Crouzet,J.C.Fabre,J.C.Laprie,
E.Martins,andD.Powell.“FaultInjectionforDependabilityValida-
tion:AMethodologyandSomeApplications”.IEEETransactionson
SoftwareEngineering,16(2):166–182,1990.

[AD97]YehudaAfekandShlomiDolev.Localstabilizer.InProceedingsofthe
16thAnnualACMSymposiumonPrinciplesofDistributedComputing
1997.,7)9(PODC

[ADK01]A.Arora,M.Demirbas,andS.Kulkarni.“GrayboxStabilization”.In
ProceedingsoftheInternationalConferenceonDependableSystems
2001.,etworksNand

[AG94]AnishAroraandMohamedG.Gouda.Distributedreset.IEEETrans-
actionsonComputers,43(9):1026–1038,September1994.

[AK95]AnishAroraandSandeepS.Kulkarni.Designingmaskingfault-
tolerancevianonmaskingfault-tolerance.InProceedingsofthe14th
IEEESymposiumonReliableDistributedSystems(SRDS95),pages
1995.174–185,

[AK98a]AnishAroraandSandeepS.Kulkarni.Componentbaseddesignof
multitolerantsystems.IEEETransactionsonSoftwareEngineering,
1998.uaryJan24(1):63–78,

[AK98b]AnishAroraandSandeepS.Kulkarni.Designingmaskingfaulttoler-
ancevianonmaskingfaulttolerance.IEEETransactionsonSoftware
Engineering,24(6),June1998.

[AK98c]AnishAroraandSandeepS.Kulkarni.Detectorsandcorrectors:Athe-
oryoffault-tolerancecomponents.InProceedingsofthe18thIEEEIn-

513

ternationalConferenceonDistributedComputingSystems(ICDCS98),
1998.yMa[AL91]Mart´ınAbadiandLeslieLamport.Theexistenceofrefinementmap-
pings.TheoreticalComputerScience,82(2):253–284,May1991.
[APSV91]BaruchAwerbuch,BoazPatt-Shamir,andGeorgeVarghese.Self-
stabilizationbylocalcheckingandcorrection.InFOCS91Proceedings
ofthe31stAnnualIEEESymposiumonFoundationsofComputerSci-
ence,pages268–277,1991.
[AS85]BowenAlpernandFredB.Schneider.Definingliveness.Information
ProcessingLetters,21:181–185,1985.
[Avi85]A.Avizienis.“TheN-VersionApproachtoFault-TolerantSoftware”.
IEEETransactionsonSoftwareEngineering,39(4):1491–1501,1985.
[BDDT98]JoffroyBeauquier,SylvieDelae¨t,ShlomiDolev,andS´ebastienTixeuil.
Transientfaultdetectors.InProceedingsofthe12thInternationalSym-
posiumonDIStributedComputing(DISC’98),number1499inLecture
NotesinComputerScience,pages62–74,Andros,Greece,September
1998.Springer-Verlag.
[CDPV01]A.Cournier,A.K.Datta,F.Petit,andV.Villain.“Snap-stabilizing
PIFAlgorithminArbitraryRootedNetworks”.InProceedingsofthe
InternationalConferenceonDistributedComputingSystems,pages91
2001.98,–[CM88]K.ManiChandyandJayadevMisra.ParallelProgramDesign:A
Foundation.Addison-Wesley,Reading,MA,Reading,Mass.,1988.
[Cri91]FlaviuCristian.Understandingfault-tolerantdistributedsystems.
CommunicationsoftheACM,34(2):56–78,February1991.
[CT96]TusharDeepakChandraandSamToueg.Unreliablefailuredetectors
forreliabledistributedsystems.JournaloftheACM,43(2):225–267,
1996.hcMar[CW96]E.M.ClarkeandJ.M.Wing.“FormalMethods:StateoftheArtand
FutureDirections”.ACMComputingSurveys,28(4):626–643,1996.

613

j74][Di

93][DIM

97][Dol

00][Dol

W95][D

[Gae99a]

[Gae99b]

91][GMm93][Gu

[GV00]

[GV01]

EdsgerW.Dijkstra.Selfstabilizingsystemsinspiteofdistributed
control.CommunicationsoftheACM,17(11):643–644,1974.

ShlomiDolev,AmosIsraeli,andShlomoMoran.Self-stabilization
ofdynamicsystemsassumingonlyread/writeatomicity.Distributed
Computing,7:3–16,1993.

ShlomiDolev.Self-stabilizingroutingandrelatedprotocols.Journal
ofParallelandDistributedComputing,42(2):122–127,1997.

ShlomiDolev.Self-Stabilization.MITPress,2000.

ShlomiDolevandJenniferL.Welch.Self-stabilizingclocksynchroniza-
tioninthepresenceofByzantinefaults.InProceedingsoftheSecond
WorkshoponSelf-StabilizingSystems,pages9.1–9.12,1995.

FelixC.Gaertner.Fundamentalsoffault-tolerantdistributedcomput-
inginasynchronousenvironments.ACMComputingSurveys,31(1):1–
1999.hcMar26,

FelixC.Gaertner.Transformationalapproachestothespecification
andverificationoffault-tolerantsystems:Formalbackgroundandclas-
sification.JournalofUniversalComputerScience(J.UCS),5(10):668–
692,October1999.SpecialIssueonDependabilityEvaluationand
t.ensmesAssMohamedG.GoudaandNicholasJ.Multari.Stabilizingcommunica-
tionprotocols.IEEETransactionsonComputers,40(4):448–458,April
1991.H.PeterGumm.AnotherglanceattheAlpern-Schneidercharacter-
izationofsafetyandlivenessinconcurrentexecutions.Information
ProcessingLetters,47(6):291–294,1993.

FelixC.G¨artnerandHagenV¨olzer.Redundancyinspaceinfault-
tolerantsystems.TechnicalReportTUD-BS-2000-06,Departmentof
ComputerScience,DarmstadtUniversityofTechnology,Darmstadt,
Germany,July2000.

FelixC.G¨artnerandHagenV¨olzer.Definingredundancyinfault-
tolerantcomputing.InBriefAnnouncementatthe15thInternational

713

00][Hil01][HJS

02][HJS

[IT96]02][JHCS

01][JHS02a][JHS

02b][JHS

SymposiumonDIStributedComputing(DISC2001),Lisbon,Portugal,
2001.reOctobM.Hiller.“ExecutableAssertionsforDetectingDataErrorsinEm-
beddedControlSystems”.InProceedingsInternationalConferenceDe-
pendableSystemsandNetworks,pages24–33,2000.
M.Hiller,A.Jhumka,andN.Suri.“AnApproachforAnalyzingthe
PropagationofDataErrorsinSoftware”.InProceedingsoftheInter-
nationalConferenceonDependableSystemsandNetworks,pages161
2001.170,–M.Hiller,A.Jhumka,andN.Suri.Ontheplacementofsoftwaremech-
anismsfordetectionofdataerrors.InProceedingsoftheInternational
ConferenceonDependableSystemsandNetworks,pages135–144,
2002.R.K.IyerandD.Tang.ExperimentalAnalysisofComputerSystem
DesignDependability,chapter5inFault-TolerantComputerSystem
Design.PrenticeHall,1996.
A.Jhumka,M.Hiller,V.Claesson,andN.Suri.OnSystematicDesign
ofGloballyConsistentExecutableAssertionsinEmbeddedSoftware.In
ProceedingsLCTES/SCOPES,pages74–83,2002.

A.Jhumka,M.Hiller,andN.Suri.“AssessingInter-ModularError
PropagationinDistributedSoftware”.InProceedingsofthe20thSym-
posiumonReliableDistributedSystems,pages152–161,2001.
A.Jhumka,M.Hiller,andN.Suri.“AnApproachtoSpecifyand
TestComponent-BasedDependableSoftware”.InProceedingsofthe
7thInternationalSymposiumonHighAssuranceSystemsEngineering,
2002.A.Jhumka,M.Hiller,andN.Suri.“Component-BasedSynthesisofDe-
pendableEmbeddedSoftware”.InProceedingsofthe7thInternational
SymposiumonFormalTechniquesinReal-TimeandFault-Tolerant
Systems,pages111–128.LectureNotesinComputerScience(LNCS),
2002.

813

03][JHS

[KA97a]][KA97b

[KA98][KA00]

02][KE99][KRS

l99][Ku

A.Jhumka,M.Hiller,andN.Suri.“AFrameworkfortheDesign
andValidationofEfficientFail-SafeFault-TolerantPrograms.”.InTo
Appear,ProceedingsSoftwareandCompilersforEmbeddedSystems
2003.,ES)P(SCOS.KulkarniandA.Arora.“Once-and-forallManagementProtocol
(OFMP)”.InProceedingsofthe5thInternationalConferenceonNet-
workProtocols,1997.
SandeepS.KulkarniandAnishArora.Compositionaldesignofmul-
titolerantrepetitiveByzantineagreement.InProceedingsofthe18th
InternationalConferenceontheFoundationsofSoftwareTechnology
andTheoreticalComputerScience,Kharagpur,India,pages169–183,
1997.S.KulkarniandA.Arora.“MultitoleranceinDistributedReset”.
ChicagoJournalofTheoreticalComputerScience,1998(4),1998.
SandeepS.KulkarniandAnishArora.Automatingtheaddition
offault-tolerance.InMathaiJoseph,editor,FormalTechniquesin
Real-TimeandFault-TolerantSystems,6thInternationalSymposium
(FTRTFT2000)Proceedings,number1926inLectureNotesinCom-
puterScience,pages82–93,Pune,India,September2000.Springer-
erlag.VS.KulkarniandA.Ebnenasir.“ComplexityofAddingFail-SafeFault
Tolerance”.InProceedingsInternationalConferenceonDistributed
ComputingSystems,2002.
SandeepS.Kulkarni,JohnRushby,andNatarajanShankar.Acase-
studyincomponent-basedmechanicalverificationoffault-tolerantpro-
grams.InAnishArora,editor,Proceedingsofthe19thIEEEInter-
nationalConferenceonDistributedComputingSystemsWorkshopon
Self-StabilizingSystems,pages33–40,Austin,TX,USA,June1999.
IEEEComputerSocietyPress.
SandeepS.Kulkarni.ComponentBasedDesignofFault-Tolerance.
PhDthesis,DepartmentofComputerandInformationScience,The
OhioStateUniversity,1999.

913

[LA90]P.A.LeeandT.Anderson.“FaultTolerance-PrinciplesandPrac-
tice”.volume3ofDependableComputingandFault-TolerantSystems.
SpringerVerlag,1990.
[Lam77]LeslieLamport.Provingthecorrectnessofmultiprocessprograms.
IEEETransactionsonSoftwareEngineering,3(2):125–143,March
1977.[Lap92]J.C.Laprie.“Dependability:BasicConceptsandTerminology”.In
DependableComputingandFault-TolerantSystemsseries,volume5.
Springer-Verlag,1992.
[LCKS90]N.Leveson,S.S.Cha,J.C.Knight,andT.J.Shimeall.TheUse
ofSelf-ChecksandVotinginSoftwareErrorDetection:AnEmpirical
Study.IEEETransactionsonSoftwareEngineering,16(4):432–443,
1990.[Liu91]ZhimingLiu.Fault-tolerantprogrammingbytransformations.PhD
thesis,UniversityofWarwick,DepartmentofComputerScience,1991.
[LJ92]ZhimingLiuandMathaiJoseph.Transformationofprogramsforfault-
tolerance.FormalAspectsofComputing,4(5):442–469,1992.
[LJ93]ZhimingLiuandMathaiJoseph.Specificationandverificationofre-
coveryinasynchronouscommunicatingsystems.InJanVytopil,editor,
FormalTechniquesinReal-timeandFault-tolerantSystems,chapter6,
pages137–165.Kluwer,1993.
[LJ94]ZhimingLiuandMathaiJoseph.Stepwisedevelopmentoffault-
tolerantreactivesystems.InFormaltechniquesinreal-timeandfault-
tolerantsystems,number863inLectureNotesinComputerScience,
pages529–546.Springer-Verlag,1994.
[LJ95]ZhimingLiuandMathaiJoseph.Aformalframeworkforfault-tolerant
programs.InC.M.MitchellandV.Stavridou,editors,Mathematicsof
DependableComputing,pages131–148.OxfordUniversityPress,1995.
[MAM84]A.Mahmood,D.M.Andrews,andE.J.McCluskey.“ExecutableAs-
sertionsandFlightSoftware”.InProceedingsofthe6thAIAA/IEEE
DigitalAvionicsSystemsConference(DASC-6),pages346–351,1984.

014

75][Ran

[Ros95]ai78][S

S98][S

S99a][S

]S99b[S

[YB94]

B.Randell.“SystemStructureforSoftwareFaultTolerance”.IEEE
TransactionsonSoftwareEngineering,1(2):220–232,1975.

D.S.Rosenblum.“APracticalApproachtoProgrammingwithAs-
sertions”.IEEETransactionsonSoftwareEngineering,21(1):19–33,
1995.S.H.Saib.“ExecutableAssertions:AnAidtoReliableSoftware”.
InProceedingsof11thAsilomarConferenceonCircuits,Systemsand
Computers,pages277–281,1978.

N.SuriandP.Sinha.“OntheUseofFormalMethodsforValidation”.
InProceedingsofthe28thInternationalSymposiumonFault-Tolerant
Computing,pages390–401,1998.

P.SinhaandN.Suri.“IdentificationofTestCasesUsingaFormal
FIApproach”.InProceedingsofthe29thInternationalSymposiumon
FaultTolerantComputing,1999.

P.SinhaandN.Suri.“OntheUseofFormalTechniquesforAnalyzing
DependableReal-TimeProtocols”.InProceedingsRealTimeSystems
Symposium,pages126–135,1999.

H.YinandJ.M.Bieman.“ImprovingSoftwareReliabilityWithAsser-
tionInsertion”.InProceedingsoftheInternationalTestConference,
1994.839,–831pages

114

214

•Name:ArshadJhumka

•DateofBirth:14.04.1974

•Nationality:Mauritian

CV

PData:onalers

SchoolEducation

•Jan.1985–Dec.1989:RoyalCollege,Port-Louis,Mauritius(SC)

•Jan.1990–Dec.1991:RoyalCollege,Port-Louis,Mauritius(HSC)

UniversityEducation

•Oct.1992–Jun.1995:UniversityofCambridge,England(BAComputer
)Science

•1999:MAComputerScience,UniversityofCambridge

•Feb.2000–Nov.2002:ChalmersUniversityofTechnology,Gothenburg,
edenSw

•Nov.2002–Nov.2003:TU-Darmstadt,Germany(PhDComputerSci-
)ence

314