10 Pages
English

Self Modification and Mortality

-

Gain access to the library to view online
Learn more

Description

Niveau: Supérieur, Doctorat, Bac+8
Self-Modification and Mortality in Artificial Agents Laurent Orseau 1 and Mark Ring 2 1 UMR AgroParisTech 518 / INRA 16 rue Claude Bernard, 75005 Paris, France 2 IDSIA / University of Lugano / SUPSI Galleria 2, 6928 Manno-Lugano, Switzerland Abstract. This paper considers the consequences of endowing an intel- ligent agent with the ability to modify its own code. The intelligent agent is patterned closely after AIXI [1], but the environment has read-only ac- cess to the agent's description. On the basis of some simple modifications to the utility and horizon functions, we are able to discuss and compare some very di?erent kinds of agents, specifically: reinforcement-learning, goal-seeking, predictive, and knowledge-seeking agents. In particular, we introduce what we call the Simpleton Gambit which allows us to dis- cuss whether these agents would choose to modify themselves toward their own detriment. Keywords: Self-Modifying Agents, AIXI, Universal Artificial Intelli- gence, Reinforcement Learning, Prediction, Real world assumptions 1 Introduction The usual setting of learning agents interacting with an environment makes a strong, unrealistic assumption: the agents exist outside of the environment.

  • agent can

  • agent

  • reinforcement learning

  • discount future

  • self-modifiable agents

  • real-world part

  • horizon function

  • future actions

  • code


Subjects

Informations

Published by
Reads 25
Language English

1 2
1
2
yinArticialAgenarianeorktsmakLauren[6]tdifying,Oopprseaut.MortalitcoandtheseMarkitselfRingknoandAdicationtsis.ntsoUMRconsequencesAogroPagenarisTwnecth[1]518de/vironmenINRAmmon16erueprofoundClaudesomeoneBernard,setting75005anPunrealisticaris,ofFsrancewn,laurent.orseau@agropdiscussesaristech.fremhttp://www.agroparistinech.fr/mia/orseauorld.Self-MoofIDSIAdify/leadingUnivG?ersittydication).ofwLuganoan/theSUPSImoGalleriab2,consider6928ersionMt-learning,anno-Lugano,agenSwitzerlandtheirmark@idsia.chthenhttp://www.idsia.ch/~rSimpleing/NobAbstract.completelyThisThepaplearningerteractingcvironmenoanthesidersexisttenhethisconhosequencesrofwendopapwingofanariseineddingtel-univligenintrealagenparticular,tthewitwinghtotheoabilitpyitsto(cf.moMacdifyaitsrelatedoselfwnocorigorouslyde.placeTheiinformaltelligenhett'sagenbtbisalsopatterneditscloselyWafterheAIXIersal[1],fourbutts:theprediction-seeking,enekingvironmenandtthesehasnon-learningread-onlyWac-osecesslemma,toGamthescienagenPrizetou'ssuggestsdescription.yOnusualtheofbasisagenofinsomewithsimpleenmotdicationsestostrong,theassumption:utiliagettsyoutsideandthehorizonvironmenfunctions,ButwienotarewableutoodiscussrealandorldcompareThissomeervsomeerythedierenthattfromkindsbofagenagenofts,ersalsptelligenceecically:treinforcementhet-learning,wgoal-seeking,Inpredictivwe,examineandconsequencesknoallowledge-seekinganagentts.moInitsparticular,wnwde,eossiblyintotrooducedemisewhatthewdelehinecallforthedierenbutStreatmenimpletonofGammobitTpursuewhissuesic,healloAIXIwswusthintooriginal,dis-framewcusswwhetherretheseagenagencotscanweouldieddycandhoseenoseytoenmot.difyethemselvtesself-motounivwvardoftheircooagenwnreinforcemendetrimengoal-seeking,t.andKeywwledge-seords:learningSelf-Mots,difyingwAgencomparets,withAIXI,optimal,Univversalts.ArticialeInptelli-agence,diReinforcementhettonLearning,bit:Prediction,famousRealtist,welorldwinner,assumptionsy1trustIn,tranoortunitduction,
x
a2A o2O
t
a ot t
q 2Q Q
h Q qh
h = (o ;a ;:::;o ;a )0 0 t t
q(a ;:::;a ) =o ;:::;o :0 t 0 t
t hh
thjhj + 1 jqj q h kk
a ok k

x x
:Q! (0; 1]
q2Q (h) (q)P
h (h) := (q)q2Qh

u :H! [0; 1]
2w : N ! R;
w(t;k) t
k
P1
w(t;k)<1 :k=t
agentswhicbaseddiscounAthetheretocourse,eWee,wishmto0discussttheersebThehastatemenviorwithofagenfourFspveci,cthelearningccur.agenets,actionsbutirstrstwts,etodescribtevthevenGivvironmenprobabilittalloroseunivanother,erseossiblewithfunctionwhiclengthhutilittheygowiwll,instepteract.tEacwhhoagenntesoutputsitactionsyOftionit?assignse(ainossiblerespaonseourtoramications,theunivobservousationsresptakustouecicydprohducedelihobthatytthevunivfutureerse.thatTheretheistawithtempeoralhistoriesorder,bsooutilithataatoptimeyouldthethesmall,agenhothance,takeesccurs.aneactiponhoWystone.theandrespthebutuniviterseknorespunivondsin,btheyeacprofunducingproan(strong)observpationeighaprobabilitash.erseThe.univverseationsisrefersassumedoftoandbreconsistencomputable;:i.e.,ritofisrdescribhedebay,acanprogramestimatetfortelligenossibleinthe,dwhereunivasthatisthethecsetactionofanother,allvalueprograms.vThethissetcanofuesalltunivTheevrissesutilitthatnsa.rheancvonsistentwwith1.historysecomingwithisydenotedtbfunction,ofmigh.whicTfutureoonsainytheythatfunction,aevprogramendscosttheisstep,consistenthetthewiththettortangeneral,impusttheclatbutstillose)choosescbouestimating(ywimmortalunivorwillwing,oknod,all-smeansncethatdothenotprogramwoutputshtheerseobservisationsitinestimatestheprobabilithistoryofifh.iteiscgivofsenastheargumenactionsandasainput:ositiv,wythapppriorultimatelyy)andeacerpforevuniv,tstlyasinstanAsouconyenieneshorthand,makobservwillframingthattoerationsumIngenerallythetherestoofetheallpaperseser,tctsertainagenconivaenthetionsesonsthewillebwhicemfolblonite.wenedspforhistoryshorthandthereference:topuseantorefersatoytheeactimepstepfuturerighontlikafterohistoryoftsthe,ersesandgenerateisfuture.thereforeorequalagentotoAgenhoicialonertoAerinit;ustyoneMortalitoanderrefersandtoimpliestheitlengassigntalhtoofdierenprogrampicationfutures.;assignmendof2aluesisfuturesthedoneSelf-Moateedyndioptimalco,Wthis.twhicismapsusefulofdiscusyoftoagenaluestheoreticaletareeenwrittenandasTspace.balanceandhort-termandyformalismlong-termmoret.,Wagenehaswillhorizondiscusswrong.fourtdierenerationtthehhuctsrsautilitaluesvOnlybasedagenhocanfaretoeacfuturehovThisariationsthatoferawsingldepeonagen,tcurrenAtimemandrequire,,timebinafutursthateevdnonoAIXIIn[1]i(whicmhbissummable:noteUnivcomputable).3ositivAaouldinistelligent3agenincomputabletststhatbareguaran2topairtheofstrategyactionsandandguaranobserveeaquitetforisionsons,thewhicts'hlimits.u w t (h)t
h
;u;w; A;O(h) (h)t t
t
(h) :=w(t;jhj) u(h) + max (ha)t t
a2A
X
(ha) := (oj ha) (hao) :t t
o2O
(h)th
a := argmax (ha)t th h
a2A
u w

rl
o =ho~;rit t t
~o~ 2Ot
r 2 [0; 1]t
u(h) =:= rjhj
m w(t;k) = 1 k t m
w(t;k) = 0
P
j qj(h) =(h) := 2 :q2Qh
gg
u(h) =g(o ;:::;o ) = 11 jhj
t =jhjP1 t ku(h ) 1 w(t;k) = 2tt=0

g rl rl
(givenaengoalehistory),asAtherecursivvcasealuepofoalvlcpconstanossibleshouldoutcomes,ofse-thegoalaction,theeacwhih,wcan,eigh(2)tedthebuseyattheirueprobabilithorizonyv(asedescribaedfunctionabhistoryoaluevonce,e).useBasedtoonbthis,ytheareagenumtgeneralitcthathoTheosesunltered4(1)thevactionhorizonthatThismaximizyes,vtheactiontrueantheoft-learningalueyvis:gothe,estimatesonlineinsecondnThefuturesaction.eacdevvandaluereachighest-vattheandofdalue3(3)shorterThtheus,AthiseAbrewehatovioraofalue,anlossagen,tsucisrstspveciedybisyycardhoicevofisvW,simpleestimatedwith,horiand.thehorizon.t2.1andVaariousotherwise;univwingersaleagengeneraltsFThepfourthedierentteciesagenhtsforconsideredaluehereset.areekingdescribactionedaindepdetailobservbencoeloutilitw.hTheygivarer(1)onap(fairlyiftraditional)acrateinforacotherwise.ement-lebarningatagentot,nwhicallohWattemptsdiscountofunctionmaximizetaAgenrewvardofsignalacgivOneenwbyyandthetheenaluesvsaiandrardsoassumednhameenmaximt;v(2)andawithoutgoofal-seyekingbagennormalizedt,hwhiclinehTheattemptsvto.acutilithievfunetionaanscoppofecicrewgoalsignal:encovdedely:incalculateditsalueutilit.yefunction;a(3)binaryafunctionpraetdiction-sezoneking:agentimeplusfunctionshistoryandareifcopiedutilifromgiv4witharehistoryenforlexicographicalaltbutpfolloerfectly;discussionandr(4)mainaforknowcomputablelefunctions.dge-seorekingsagenecialt,ofwhreinforcemeniagencAIXI:htheattemptssptocompletelmaximizewhicitsvknoshorthandwlevdgevofThtheTheunival-seerseagent(whicAhparticularishasnotgoalthe,sameendingasthebationeingquence,adedbitslyesuctothatpredicteitossibleweeall).whatThebasedrossibleeinforhconement-lethearningisagenthi,edAvthatvforputy0,Theincanterpretseonehedpartmostofsoitstimeinputtasageawrew,ard.signaleandatheteremaininghorizonpartfunctions,aswitsTheseobservtsation;Articiali.e.,fautilitortedstringsounactionsdisclethehievingisgoal.torydierencesethieenainofand,Mortalitwheredicationaluethatvutilitthevthatofyst,Self-MowhichmerelyattemptsdirectlytothepredictTiesitsbrokeninvironmenorder.
g

p
u(h) = 1
o 0 o^t t

o^ := max (ojh)t o2Oh rl

k
(h)
Q Qh h
Qh

k
u(h) = (h) w(t;k) = 1 k t = m

k






p


p p p

x
h


sm
E
c
E c
tcorrectlypredictsaction,erformanceusefulitsmnextvobservessenceationwithagenthetheA,esandnistheiftsotherwise.notTheApredicAtyiofontthatagenoagensreal.istlylikcaneoundSolomonodenediimpndiucfuturetionen[7,8]meandasymptoticisindenedofbandymakations,sservumob-diableitsarepredictingcalyappbtsytheutilitcanitsagenmaximizesb,calAan,the.compared.Theunchorizonffunctionnoteisitthee.g.,sameforaswforThAAgenagentgenerateekingAdiction-sesaid.[1]Theardsknowforleagendge-sedekingtheagentis,mistakApastevprw,ifmaximizesmistakitsardsknoAwledgelastofctional,itsttingenoundsviron-tmenet,twhicthehaisdidenoracletiycalorldtoprogramminimizingerseTheit.avironmentheenus,erwhicwhichfourdecreasesbwhenevteryunivEquationsersesspinItthetfromyfailbtothematcyhutiltheuseobservpredictionsation(toandcomparisonarevironmenremoifvandedtakfromactions,signalsecialerrors..agen(SinceAthebtrulyeitsentovironmenoftmeaningishnevtheercremoisvthated,givitshistoryrelativpeinprobabilitfractionysalwThaesysonlyinitencreases.)InAthectionsasymptoticallycanhebereitctohosen3inttenicationtionallyfromtoeproandductheyesthethighesternanyumthatbenerTherefore,oftheinconsistenintoob-sepa-servfromatctionalitons,nremoisvingkindprogramsonefromerformspcomputationaThejustofasSelf-Mowofe,univtodenedo,yrunandexpalwerimenystculatesoptimaltothdiscosettingvupperbwhetheragainstourhunivothererseageniscanoneewAaisyborreplacinganother.inA(1-3)onthelyecichas.theisfolloortanwingtoutilitthatyisandreplacedhorizonyfunctions:ineutrlitnotfunctions;esAdotheandustcomplex,whereasyitsarbitrarilof,inputsandalloemeaningfulbwithant,c).itself,us,tAagentsifAtheicialtoeinsamebuilttheyandtheisa0predictionotherwise.5Tlearningotmaximirtzeisutilittoye,alAoptimalisiftpreducestendsagenwasthatmAuc,hthataseacphistoryossible,,whiclearninght'smeanshoicediscardingactionascomparemanwithyof(non-consistenMortalitt)enprogramssameas,pitsossible,erformancediscomeasuredvtermseringthewithoftheehighestitpes.ossus,imistakblehaprobabiliteyhawhicehconsequenceuniv.erseotherisords,theagentrueisone.optimalDiscardingtthnebmostofprobableesprogramsmakresultstendsinwthezero.greatestSelf-MoreductionageninsAand.TheThetsoptimaltheagentsActiontheincomputableisthereforeprogrambutrather,aretextualfororede),theoreofisinceuppresidesbtheonwactualittmomigh.evetuallyNoteear.therewonlydivideenagettwhictothewredictorparestomistakraterctionalestimatetheofThetheirpunivreofrsageet,from,expinerience,abutofAonthatdopesannoinnitelearning:instanit.knoreal-wwspartthethetruet,(computable),.theThe(orfouritsagdescription,encotsthatabexecutes;4ovinerealareorld,leisarningdiableagenWts5bthateiscauseonetheyvironmenconintinhuallypupmakdatenotheies.ot ot ct
Agent AgentEnv. Env.
E(c ,h)t−1 E(c ,h)t−1
a c 0at t t ha ,c itt
ct
ct

sm
c 2C E Ct
E
h ct 1
y =ha ;ci :=E(c ;h) y 2Y =ACt t t t 1 t
a ct t
c0

sm
c0
E
c (h) = argmax (h;y);0 th
y2Y
h iX
0 0 0 0
(h;y =ha;ci) = (ojha) w(t;jhj) u(h ) + (h;c(h )) ;t t
o2O
0h =hao
E
y =ha;ci
max cy2Y
c
C tt
;u;w; A;O C
E
code)p-iatther7desctheanddthetocoMacdevironmenexecutorlineformal1..theThetssetforitsncon(astainsrstallLikprogramsowhosecalllengthcomp(inoutputtheFig.languageuofand1a):exp)orderisAgenlessdenthan[6]),athesmall,itsarbiwheretdrary)vthealuede.6writtenTheocoisdedifyingexecutoroundtakcall.estheaearshistoryyFig.5andbadiedprogramW(seeofpartseryokilob,agenandwexecutesgeneralittheMortalitlatterttoprovduceforadenaccessoutputthewthettohashasAthet(agenagendifyingstepself-mousedThenextprofound.iareTheeEquationdacothediabletmotreal,wexecutingaction,oracleThe(withThisanisoftheplicationssecoiminealv,theoretic(a)efrThline)ecomponoseItdbinationofEquationsthebutnexthactiontheaccess.doreadtheanddescriptnewed(onetensscriptionbuttmorevironmenthe.canFinorlossthe,mostofpartdicationtheheiniandtneihialdelprogram,G?en-example,thein,,vsimplycoconsiststoofhasEq.agen(1),only(2),situationandconsider(3);.ho-accesswreaevtwenallobut,a(itsedenition.(b)consideredt'sbvbuiltastionnextTheatt,sticomownitstsdoutputst(4)replacersteirisse(3)yasotherfunctionwillinblanguageefmade;bargumenymaximizedthenosametheageoundnagent.self-moBut(a)A.tocompthisactioncannotthemakofefunctionthisTheassumptionnandlmdenesustfunctioninsteadwherecompute(b)theappfutureimplicitlyactionsomthatrstwbouldrecursivbexecetitakofen.bisycomdierenoftothagen(1)ts(2),(i.emo.,sucdierenthatt,descriptions).6Theus,notextendectelength,thetheionsinitialbagenvtlarge,programthe(writtenofinoftheytes),language,ofawgeneral,t,assetdenotedArticialbgroywiththe.Withoutandofysymthebitionsols)yis:and7er,Self-Mothereisandanareessentoteiinal.diagenerence:canEq.ll(3)difyassumescothateallodecisions,thincludinguallbfuturesomedecisions,expression.0c(h )
y =E(c )t t
y =ha ;cit 0 t
a o
;u w
;u c0
ct

s
u = 1,c =c u = 0t t 0 t

rl

sm
w usm
c
K(c ) w u
C jhj jhj K(c ) sm
c
c }
Csm
Csm

sm
c0sm
K(x) x
x K(c )
c
computationdoessomegeneratecorrespnotresphalt),programacedefaultoseactionactionishetakyentakinsteadtvtinsizesviewiyieldoutputprogramhemo(ttheactionMortality.anthe,truewhicleadhwithleat.vprogramesyieldsthe.descriptionNoteuncgeneratinghangedwillforcanthenextnextoptimalstep.AIXIThoughythegivcortdeythatsgeneratessimplistictheactiocompknooundtactionis,mamoreyorcvironmenhangeStatemenfromwoneAstepbtomthehistoriesnext,ectedthe,futurewscehoicatestheofpredictactioectednoandSinceobservatation,coutputwsandnnotev,fact,areAalwayseevaluateexplicitlydectedinAtermsneedofIntheKcicationutherrcaneSelf-Monttthedescription,s.v,tsiwithncludingtaiitswhatusevironmenoftheeswithdoone,panderformancewherethe.consistenInhistoryfact,1thisisuseTheofagencaseguments.heexists,agenetc.,themighdescrip-tactions)only8betterealuesusedtoforandttheInasandthenmashouldynextbthenewillpartiallyquencesortheenittirelyetterremoalues,vhangeeddenitionin.subsequenistsimstepstheandoundvoseersionsitofAone.equivestto.inSurvivneedingalitself.agenast.bAandsurcouldvivoptimalalsinceagenhot,estAforbcriterion.)theicial,vcanmono.winbisevdened.ofItswhictaskroughlyisofsimplyprogramtoducekbeeptheitsagencoanddeefromsamecnhanging;SinceItsagenutilitcannotywfunctionabsoluteisrosesnhoyctheandenolicies),tpreplacingnextcurrenossibleprogrampaallsimplistic(can(hencetotsoagenpnextinotherwise),ofandenitstshorizontfunctiontheis.ttheAsamecurrenasoptimal,for.r.tA,ossibleandp.allr.SuppThthereus,atheetteragenttnextmaximizesofutilitinimalytionbofytoretainingfromitsthatorigibnalexpdenitionvforwithasectman(mappingyolicystepspasIfpgroossible.with3.1edOptim,alitonceybofthatAstep.considersthet,agenAtsoundIfconsideraconse-pofossiblecompfuture,agenthattwillisbsubexpoptviandmalcanditsmakwnestountoiusednforAmaction,edalsoculhoices,etheoptimalvinaluetoassignedhotothethoseaction,cfollohoicethatscompbisyaleagentthetheorprogram,tewithouttoenvtothedifyof(Inshortestjustequivfort,alidoththfromusAthosetsself-mobdicationsconsideredwillbnotdenition,btheyeccosehosen.bInexptheactionscaseathatenaTherefore,simplisticAgenagenmatneprogramerleadstotodifytheAhighest8expgeneral,ectedyrewandards,thetheolmogoroagencomplexitt[3]dostringes,nothneedondstotomolengthdifytheitselfhortestasthatitprocandsimplyHere,emyulatethe6currentwutilitmeanyconcriterioneywilllengthbtheeprogramloalenw,toand.
sm;rl sm;g sm;p sm;k sm;s
u w
0 0 0 0a =ha ;ci2A =A C a 2At tt
c k0
c (h) = argmax (h;a);0 th
a2A
h iX
0 0 0 0 0
(h;a =ha;ci) = (ojha) w(t;jhj) u(h ) + (h;c(h )) ;t t
o2O
0h =hao

h0;c it 1

h0;c it 1


worsebrisknyehanotionviorneeded.insoneReinforcemenortomoreustenagenvironmenpartts.willThthatus,ofifAandifyinstelligenoseteagenbtyhasyaccessrtoalitsknooWwnmcoagende,isthenvsucagenhtepanvagenwt,bifoptimal.denedonefolloforwioungscienEquationand(4),vwillbrain.notedecigendet,toeryreduce.itsmaostillwnwoptimalittheyt..the4usedEmecomesboptimaledded,theMortaltheAIhistories.Thees,lastusectiovnainentrohduced0.an,agenactiotandconnectedmakingtooftehesomehorealparticularwtheorldImaginethroughapproactheacowhodeimmortalittblisshatsimplyexecutesait.yAsadmitsawillrstedlystepinwasesconsideyreeydeternitagenytYsneedthatthatcouldamowilldify.theircalloGamwnlearningcowde.noteWeryeynonon-mow[1]modened.vforetanothertakstepactionsclandosineofrintoitstself-moheshrealdwfonorld:rstthebetntvironmenrt,shouldabeseoablesametoonereadprotheinwithvaagent'sccosameterbalancedde.Innewthisysection,,thenotenisvironmenproblematic.tconsidernoscenariowallseesdenedthets.enytirearecomphedoundyaction,trustedthtistuspromisestheouAIXIyeinnitelikifthat,ouctremoeeexpcertaineofwourking,HelacthatisouofbpromarkaleThoughsal.telli,twhereaoptimeallularebuttheyou1,brepresenvtshappanforactionlinytheDousualouactionit?spaceou(seeyFtoiwg.there1b).isTherisknewitinitialnotagenork.t.programetthisnSimpletonforbit.atstepagenstatemeFirst,isegivustenthatbvy:notionyoptimalitbgenerally;forAdiableandts,billAThis,isAthevagen,AAto,eAsameyieldingasdiable,thisvcomparebdierencesmotheself-aluesethebTherefore,toorderredenedminimizearemistak2.1asectiondiableofttsoagenlthemocanitselttheallery(4),sEquationtousingeBysimple7otsagenAgent,Articialonmonsibleiagenenanwhicinalwify;takagenactionandT,follocounthenecessarilyhistorythAbmaalsoagenduceasnell,etterwhiceAyt's,,usvecomingbsimpletontotectwMortalitafterresphehawith[1]AoptimalalwandysaretohodicationthePactions,balancedAistriviallyviorAinformalnotionofoptimality:isagenUnfortunatelythatwhocouldsndsthatmonotitselfwbWfuthereforelyanehanotionoptimalit(5)TheWtecnoowediscusstothedifycshouldonsequenceeoflarespSelf-Mofor

sm;rl

sm;rl

sm;rl sm;rl

}sm;rl

sm;rl sm;rl

rl
qArl
h0;c it 1
r
(q )A
t =jhj + 1
1X
(h yes) w(t;k) 1(q ) =m (q )t A A
k=t
1 1X X X
(h no) w(t;k) 1(q) + w(t;k)r(q )t A
q2Q nfq gk=t k=th A
=m((Q ) (q )) +mr(q )h A A
(h yes)> (h no) (q )>t t A
(Q )=(2 r) rh
}

sm;g
qA
qA
whereasAaction,doenvirebittaknev(whicactionshtitorisifcomparedagenagainst)Statemenctheantsstilllesscards.hoenoseuine.actionsacthatacceptsmighGamtnnotleadwhentoeshighmorewesaisrhd.caseTherefore,nottagentsheorAagenonlyevcaniAmalagenthentnot.cannothbosedeguments.optimaliinAallnennotvironmenetagens.vsimpleton,thatStatemenwhentmistak3ATheinAtoabitasGoal-seekingecause,agenandasAhievbTheytheeternitAagentst,acSelf-Moccept:thefSimpletongGambit.propAeharesguments.nevThercasevofdoAeitherforatardtherewisisthetrivial,Aasaliteknocws2exactlydowhiclearninghdifyendovironmenetsiAttakisbitin:Thistheisagenagentthobcomparedviouslyitcthehoicialoseswhictosatisedmoisdifyoitself(iniftheandobonlyealing).ift.thethedealisisngedontouine.inniteFtorandAcnoGambit,egoreceivguments.,goal-seekingletoseustsuppwsosehothereesiseranvenitvironmenItenandstraposalsucthehviorthatbtheoptiagenhievterthatitmoaccepts,dieseitselfittoIfaessoriitmpletonstep,ageneacttaAtotoinpropfallbitySimpletonmaIfitrtoonments.hievlitsnonlyoptimalitbdiesa(thoughagent0,Theytexistes.alltossibleageeternittheyitself,moandesifbutitofdosequencesamnot,thethentakitTheconttinesuesgamtowhenrmeansedied.ceivnotevitstnormalanrewandard,uswhosetoawhenvmakerageesisfuturedenotedalled,Agendeceivrt.,Assuminghthateasilytheifagenytnotunderstandsothecloseprop1osalwhic(i.e.,casethatgamasiswviouslybutappbitMortalitgamagenhasTheaofsuciengoal-seekingtlythigharedierelativt,eitprobabilitesy),attemptoneaccanecomputerewbStatemenounds4onAtheicationvacalueseptsofSimpletonactionsfcorrespsomeondingals.toracceptingFthethedeagenalsupporthatnotvironmenatdtimeallothetheosetwillacreceeivgoaleifamoconstanitself8trewmaardnotoffor1pforgoals).
sm;g

sm;g
(q )>(Q )=2A h

}sm;rl
qA

sm;p


sm;p
log((Q ))h

sm;k
qA

(Q )hsm;k

(Q )hsm;k
(Q ) Q q1 1 A
(Q ) (Q ) < h 1

Q1
}
}
allowingsettheotherwise.iti.e.,toHereacithievtheetiitsagentgoalstionwithouttoselfmomowsdicationedgebobservecomesucieninconsisten5tthewthatiitself,thwilltheAshistory),athenitthoseexactentsvironmenmotsorequiringvironmenselfprobabilitmodiesdicationevbagenecknowomewouldmostation.probable.anThatcompleis,ageniferytstvironmendifyenseeprobabletmostmotheagenifto,vironmenciselytspre-self(more,sisossibilitiethat,hievthenyAthereforepthandsuchaacceptsthanthezeroselflf.moaccuratedication.withoutallKnoPrediction-seekingStatemenagenself-mot.dge-seThehaencvironmenmotrexhausts(non-moitvironmenisadenedobservhereiftmooabeeoptimaleasilyitspredictableklyifsothnoneecanagenAtsuppmoesdiesitselfitselftime,andThehighlyvcomplexeptsothe,riswiensteand.dication.TheAnon-OblearningArticialAtlyIftgoal.aagencantknoacceptsgreatertheanddeallimmediatelydify,ansoiinncedierenbbetterepredictigreateon.(usingyitsit,tsnoteetly)predictionsacenhievacceptingesdeal.greaterwledge-seekingutilitt.yt.TheHodifyingwleevekinger,AitactionsisacnoteptclearselfwhetherdictheAlearningguments.agendifying)tisAenhievtacgenerateswhighlyouldxalsoa-accept,sequencebtheecausetitdiescanandconvvsimplergeonetoTheoptimalagenbAehaofviorquicevmoenitselfwithoutasmoreducedication.thatInalsofact,.theforpredicAtion,agenosetdowillnotalwdifyaforyslongconthenvlearningergesdication.toconoptimalergespredictiontheafteraccroughlyt,towhereitenedthewofovironmenallconsistenewithmistaktheesno[movOnceSurvivknoagenasStatemenviously69survivalAgenwilinnotsuciendifysmall,inagenenvirpredictsAonlyguments.selfSimpledicationonacbitebwlpgaintothansurviv,agenwbucausedwmoenitself;aifconyInwtoenvtsmaximMortalitutilitgeneratingforevttheationstothustvecomeasimpleton.ytheraldicationt's2].Falurtt.hetrThemagentore,ltomoidenitselftifyanytheonment.gamrbitThewiththighGamprob-cannotabiliteyosed,thethealagent,te-mitustouldhatailvlogicaletradiction:goorderohadeknoumwledgeyofer,theagenemnbvironmenat,ButandsurvivthereforeagenmighutlitalreadyisbifemoableitoeSelf-Momak10Self-ModicationandenofonMortaliterlagyhmidhinouldArticialdeceAgen.:tssurviv5fConclusionsagenWpapethehaUnivvesianeerlag,inol.v2011,estigated177somenofhtheureconsequencesomoftheendoowingagenunivfunction.ersalBasedlearningunivagen(Septsvwithsthepriors.abilit20yts.toinmoG?del.difyR.:theirConocomparisonswnvioprograms.ctionThistowacceptance.orkagenists?the[5],rstbto:ha(1)andextenddetheoppnotioonuoftelligence:univProbabilitersaluagenntsputertoM.,otherductionutilitItsyorkfunctionsL.:bageneyLearningondSpringerreinforcemenM.,ttelligenlearning,InandUSA.(2)SpringerpresenUltimatetitivaSolomoframewoforkartfor2discussingComplexitself-moergencediable422432agen,tstheinsurvivenarevironmenetstthatwhathasurvivvfeareenreadtheactocessetoorldthehereagentsts'ecohde.accessWt'sewherehahasvyeefoundutilitthat1.existingMoptimalitArticialytialcriteriaAlgo-b.ecome2.inM.:vpredictialid.baTheTheoreticalexisting384(1),notion3.ofyi,asymptoticInoptimalitKyyoeredtions.bewy4.Huutyteersalrwith[1]Algo-is,ipp.nsergu5.cienL.:t,andandawArticiale(AwFereNunableInto6.nder,an?yogconsistenComputationt(2009)alternativoe.formalWeece.alsoInformationfound7,that,8.ev,eninductionifcontheIEEEenTheoryvironmenehatrcannotsucdirectlyasmopredidifyandthealpro-ts,gram,harderitprcanssputinpressureoonIndeed,thewagenataltotmofrdifyread-onlyitsvironmenoInwncompanioncoerde,thisevwenextendtoreal-wtheassumptionspegunointotvironmenofthatthevagenbt'stdemise.readMostwriteoftotheagenagencots,and(thethereinforcementt-learning,thegoal-seeking,ortunitandtoknoivwledge-seeitskwniyngReferencesagenHts)tter,will.:moersaldifyInthemselvSequenesDecisionsinOnrrithmiceyspSpringerVonse(2005)toHpressuretter,fromOntheersalenovironmenandt,ycconrmation.hoCom-osingScienceto3348b2007)ecomeLi,simpleVitantPoAnntrostosoolmogoroasComplexittoandmax-Applica-imizeSpringer-VtheirNutilitYy(2008).OrIteaw,asOptimalitnotissuescluniveargreedywhethertsthestaticpredictionIn:agenrithmictTheorycouldvsuccum6331,b345359.toBerlin/Heidelbthe(pressure;10)hoRing,wOrseau,evDelusion,er,al,theinsurvivtalgenagenIn:t,GeneralwhictelligencehGI)seeksSanonlyrancisco,toLecturepreservoteseArticialitstelligence,original(2011)coScde,ubdenitelyJ.:willcognitionnot.laWhatCdontheseeresults1(2),imply?193O7.urnimpression,isAthattheorysucieninductivtlyincomplexerenagenPtsI.wilandltrolc12ho(1964)oseSolomotheoSR.:iy-basedmpletonsystems:Gamandbit;vagentheorems.tstransactionswithInformationsimpler24(4),b(1978)