BenchmarkEstimationforMarkovChainMonteCarloSamplesSubharupGuha,StevenN.MacEachernandMarioPeruggiaguha.3@osu.eduMay2002;revisedNovember2002AbstractWhilestudyingvariousfeaturesoftheposteriordistributionofavector-valuedparameterusinganMCMCsample,asubsampleisoftenallthatisavailableforanalysis.Thegoalofbenchmarkestimationistousethebestavailableinformation,i.e.thefullMCMCsample,toimprovefutureestimatesmadeonthebasisofthesubsample.Wediscussasimpleapproachtodothisandprovideatheoreticalbasisforthemethod.Themethodologyandbenefitsofbenchmarkestimationareillustratedusingawell-knownexamplefromtheliterature.Weobtainasmuchasan80%reductioninMSEwiththetechniquebasedona1-in-10subsampleandshowthatgreaterbenefitsaccruewiththethinnersubsamplesthatareoftenusedinpractice.1IntroductionWhileusinganMCMCsampletoinvestigatetheposteriordistributionofavector-valuedparameterθ,manyfeaturesofinteresthavetherepresentationE[g(θ)]forsomefunctiong(θ).AsubsampleoftheMCMCoutputisoftenallthatisretainedforfurtherinvestigationoftheposteriordistribution.Subsamplingisoftennecessaryincomputationallyintensiveorreal-time,interactiveinvestigationswherespeedisessential.Examplesincludeexpensiveplotprocessingandexaminationofchangesintheprior(sensitivityanalysis),likelihood(robustness)ordata(caseinfluence).Typically,suchstudieswouldincludehundredsorthousandsofchangestothemodel.Anotherreasonforsubsamplingisexpensive1
storagespace.Practicalconstraints,likelimiteddiskspaceavailabletousersofsharedcomputingresources,oftenmakeitinfeasibletostoretheentiresampleofMCMCdrawswhentheparameterhasalargenumberofcomponents.Asubsampleisthenretainedforfutureinvestigationoftheposteriordistribution.Geyer(1992)andMacEachernandBerliner(1994)showthatsubsamplingtheoutputoftheMarkovchaininasystematicfashioncanonlyleadtopoorerestimationofE[g(θ)].ThegoalofbenchmarkestimationistoproduceanumberofestimatesbasedontheentireMCMCsample,andtothenusethesetoimproveotherestimatesmadeonthebasisofthesubsample.Thebenchmarkestimatesmustbequickandeasytocompute.Theymustalsobecompatiblewithquickcomputationsforfurther,more(computationally)expensiveanalysesbasedontheeventualsubsample.Severalmotivatingperspectivesareusefultounderstandandinvestigatevariousaspectsofbench-markestimation.Thepointofviewofcalibrationestimation,developedinthesamplingliteraturetoimprovesurveyestimates(DevilleandSa¨rndal,1992;Vanderhoeft,2001),helpstobringalltheseperspectivestogetherintoaunifiedframework.Incalibrationestimation,aprobabilitysamplefromafinitepopulationisusedtocomputeestimatesofpopulationquantitiesofinterest.The(regressiontype)estimatorsarebuiltasweightedaveragesoftheobservationsinthesample,withtheweightsde-terminedsoastosatisfya(vector-valued)calibrationequationwhichforcestheresultingestimatorstoproduceexactestimatesofknownpopulationfeatures.Usually,theconstraintsimposedbythecalibra-tionequationdonotdetermineauniquesetofweights.Thus,amongthesetsofweightssatisfyingthecalibrationequation,onechoosesthesetthatyieldsweightsthatareascloseaspossible(withrespecttosomedistancemetric)toafixedsetofprespecified(typicallyuniform)weights.TocastMCMCbenchmarkestimationintothetheframeworkofcalibrationestimation,weregardtheMCMCoutputasafinitepopulationanda1-in-ksystematicsubsampleasaprobabilitysampledrawnfromthefinitepopulation.Thissystematicsamplingdesigngiveseachunitinthepopulationaprobability1/kofbeingselected,thoughmanyjointinclusionprobabilitiesare0.Inthissetting,the2
(vector-valued)benchmarkE[h(θ)],forwhichthesubsampleestimateisforcedtomatchthefullsampleestimate,correspondstotheauxiliaryinformationavailablethroughthecalibrationequation.Oncethecalibrationweightshavebeencalculated,theycanthenbeusedtocomputethecalibrationsubsampleestimateofanyfeatureE[g(θ)].AsthefullMCMCsamplesizeincreases,theasymptoticperformanceofthesebenchmarkestimatorsmatchesthatofthecorrespondingcalibrationestimators.Thebenchmarkestimatorsthatweintroduceinthispapercanbeshowntobecalibrationestimatorscorrespondingtoappropriatelychosencalibrationequationsandmetrics.Weinvestigatetwomethodsofcreatingweights:post-stratificationandmaximumentropy.Intheirsimplestform,post-stratificationweightsarederivedbypartitioningtheparameterspaceintoafinitenumberofregionsandbyforcingtheweightedsubsamplefrequenciesofeachregiontomatchthecorrespondingrawfrequenciesfortheentireMCMCsample.Theweightsaretakentobeconstantoverthevariouselementsofthepartitionandtosumtoone.Animprovedversionofpost-stratification,(and,infact,theapproachthatinourexperiencehasgeneratedthemostsuccessfulestimators)beginswitharepresentationofanarbitraryfunctiong(θ)as∞acountablelinearcombinationofbasisfunctionshj(θ):g(θ)=j=1cjhj(θ).Theestimand,E[g(θ)],isPexpressedasthesamelinearcombinationofintegralsofthebasisfunctions,j∞=1cjE[hj(θ)].SplittingPtheinfiniteseriesrepresentationofg(θ)intotwoparts,wehaveafiniteserieswhichmayprovideagoodapproximationtog(θ)andaninfiniteremainderseriesthatfillsouttherepresentationofg(θ).Focusingonthefiniteseries,wedeterminetheweightsbyforcingestimatesofE[hj(θ)]basedonthesubsampletomatchthosebasedonthefullsample.Inaddition,werequiretheweightstobeconstantovertheelementsofasuitablychosenpartitionoftheparameterspaceandtosumtoone.ThisproducesabetterestimateofE[jm=1cjhj(θ)]thanonebasedonthesubsamplealone.TheimprovementcarriesPovertoestimationofE[g(θ)]whenthetailoftheseriesisofminorimportance.Werefertothefinitesetofbasisfunctionsasthe(vector-valued)benchmarkfunction.Inboththebasicandimprovedpost-stratificationapproaches,wespecifyenoughconditionsthat3
(forvirtuallyallMCMCsamplesofreasonablesize)thereisauniquesetofweightsthatwouldsatisfythem.Thus,fromthepointofviewofcalibrationestimation,thechoiceofthedistancemetricbecomesimmaterial,inthesensethatanymetricwouldyieldidenticalweights.Inthisrespect,ourpost-stratificationweightsarisefromadegenerateinstanceofaproblemofcalibrationestimation.Inthecaseofthemaximumentropyweights,however,wedonotspecifyenoughbenchmarkconditionstomaketheweightsunique.Rather,amongthesetsofweightssatisfyinganunder-determinednumberofbenchmarkconditions,weselectthesethavingmaximumentropyandthis,fromthepointofviewofcalibrationestimation,istantamounttochoosingaspecificdistancemetric.Benchmarkestimationyieldsimprovedweightedestimators(havingsmallervariances)basedonMCMCsubsamples.Weinvestigatetheperformanceofestimatorsbasedonweightsdeterminedaccord-ingtopost-stratificationandmaximumentropymethods.Substantialreductionsinthevariabilityoftheestimatesoccurwhenexaminingexpectationsoffunctionsg(θ)thataresimilartoalinearcombi-nationofbenchmarkcomponents.Theoreticalresultssuggestthegainsthatweseeinpractice.ThemethodologyisillustratedonanexamplediscussedbyGeorge,MakovandSmith(1993).WecompareestimationofE[g(θ)]foravarietyoffunctionsg(θ),showingthatthereareoftensubstantialbenefitstotheuseofbenchmarkestimates.TheextentoftheimprovementinestimationofE[g(θ)]forfunctionsthatarenoticeablydifferentfromthebenchmarksisstriking,evenforvaluesofmassmallas3or4.2Asimpleapproachtobenchmarkestimation2.1AnimprovedsubsampleestimatorLetθ∈Θbeavector-valuedparameter.ImaginethatanMCMCsampleisdrawnfromtheposteriordistributionofθ.Callthesequenceofdrawsθ(1),θ(2),...,θ(N).Thedrawsareusedtoestimatesomefeatureoftheposteriordistribution.OftenthesefeaturesofinterestcanberepresentedasE[g(θ)]for4
g(θ(ki)),2()some(possiblyvector-valued)functiong(θ).ThemoststraightforwardestimatorforE[g(θ)]isNEˆ[g(θ)]f=1g(θ(i)),(1)NX1=iwheretherighthandsubscriptdenotesthefullsampleestimator.Ifoneselectsasystematic1-in-ksubsampleofthedata,thenaturalestimatorisn1Eˆ[g(θ)]s=nX1=iwhereN=knandsdenotesthesubsampleestimator.Asmentionedearlier,thisformofsubsamplingalwaysleadstopoorerestimation;theunweightedsubsampleestimator(2)hasalargervariancethanthefullsampleestimator(1).WewishtousetheinformationavailablefromthefullsampletoimprovefutureestimationbasedonthesubsampleforanyfeatureE[g(θ)].Foranappropriatelychosen(andpossiblyvector-valued)functionh(θ),werefertothefeatureE[h(θ)]asthebenchmark.WenowcreateaweightedversionofthesubsampleestimatorofE[g(θ)]asfollows:nEˆ[g(θ)]w=wig(θ(ki)),(3)X1=iwherein=1wi=1.TheweightswiarechosensothattheyforcetheweightedsubsamplebenchmarkPestimatetoequalthefullsampleestimate:Eˆ[h(θ)]w=Eˆ[h(θ)]f.)4(ThusEˆ[h(θ)]wandEˆ[h(θ)]fhavethesamedistributionsprovidedtheweightscanbeconstructed,andallfeaturesoftheirdistributionsconditionalonthiseventarethesame.Foravector-valuedbenchmarkfunction,anylinearcombinationofitscoordinatesresultsinthesameestimateforboththesubsampleandthefullsample,andtheestimatorshavethesamedistribution.Inparticular,thetwoestimatorshavethesamevariance,andwehavepossiblygreatlyincreasedprecisionforoursubsampleestimatorofE[g(θ)].5
Theconnectionbetweenaconditionallyconjugatestructureandlinearposteriorexpectationinex-ponentialfamiliesimpliesthat,formanypopularmodels,quantitiessuchastheconditionalposteriormeanforacaseortheconditionalposteriorpredictivemeanwillbealinearfunctionofhyperparam-eters.Thestructureofthehierarchicalmodelenablesustousebenchmarkfunctionsbasedonthehyperparameterstocreatemoreaccurateestimatesofthesequantities.ThereductioninvariabilitywhenmovingfromEˆ[h(θ)]stoEˆ[h(θ)]walsoappearswhenexaminingexpectationsoffunctionsg(θ)thataresimilartoh(θ).Functionssuchasapredictivevariancewhichdependonfirstandsecondmomentswilltypicallybecloselyrelatedtobenchmarkfunctionsbasedonthehyperparametersandsotheywillbemoreaccuratelyestimatedwithourtechnique.Theweightedsubsample,(wi,θ(ki))fori=1,2,...,n,isnowusedinplaceoftheunweightedsubsample.Theweightsactexactlyastheywouldifarisingfromanimportancesample,andsoweobtainweightedsubsampleestimatesEˆ[g(θ)]wforvariousfeaturesofinterestE[g(θ)]oftheposterior.Techniquesandsoftwaredevelopedforimportancesamplescanbeusedwithoutmodificationfortheseweightedsamples.2.2SomemethodsofobtainingweightsnTheconstraintsthati=1wi=1andthatEˆ[h(θ)]w=Eˆ[h(θ)]fwillnottypicallydeterminethewi.PWithasinglerealbenchmarkfunction,wewouldhaveonlytwolinearconstraintsonthewi.Wesupplementtheconstraintswithaprinciplethatwillyieldauniquesetofweights.Thetwoprinciplesweinvestigatearemotivatedbytheliteraturesonsurveysamplingandinformationtheory.Weightsbypost-stratification.Post-stratificationisastandardtechniqueinsurveysampling,designedtoensurethatasamplematchescertaincharacteristicsofapopulation.Thepopulationcharacteristicsarematchedbycomputingaweightforeachunitinthesample.Largesampleresultsshowthatapost-stratifiedsamplebehavesalmostexactlylikeaproportionallyallocatedstratifiedsample.Thistypeofstratificationreducesthevarianceofestimatesascomparedtoasimplerandom6
sample.Inthissetting,thefullsampleplaystheroleofthepopulationwhilethesubsampleplaystheroleofthesample.Thus,theessenceofthetechniqueistopartitiontheparameterspaceinto(say)mstrata,andtoassignthesameweighttoeachθ(ki)inastratum.Formally,supposethat{Θj}jm=1isafinitepartitionoftheparameterspaceΘ.LetIj(θ)denotetheindicatorofsetΘj,forj=1,...,m.Thenaturalapplicationofthepost-stratificationmethodtakesasthebenchmarkfunctionthevectorofthesemindicatorfunctions.Thatis,h(θ)=(I1(θ),I2(θ),...,Im(θ))0.Weassignthesameweighttoallsubsamplepointsbelongingtoagivenstratum.Specifically,forallisuchthatθ(ki)∈Θj,wesetwi=vj,where,accordingto(4),thevaluesvjaredeterminedbyNn1vjIj(θ(ki))=Ij(θ(i)),wherej=1,...,m.NXXi=1i=1Thepost-stratificationweightsarethenobtainedas:N−1iN=1Ij(θ(i))Pj1=ivj=PnI(θ(ki)),(5)providedeachofthestratacontainsatleastonesubsamplepoint.Asinsurveysampling,withfairlywellchosenstrata,thechancethatanyofthestrataareemptyofsubsamplepointsisnegligible.Theintuitivedescriptionofthepost-stratificationweightvjisastheratiooftheproportionoffullsamplepointsinΘjtothenumberofsubsamplepoints.Werefertothissubsampleestimatorasthebasicpost-stratificationestimator,Eˆ[g(θ)]w,ps.Theperspectiveofabasisexpansionofg(θ)providesamoresophisticateduseofpost-stratification.Insteadofusingabasisformedfromindicatorfunctions(essentiallyaHaarbasis),alternativebasesconsistoffunctionsotherthanindicators.Anattractivebasis,duetoitssuccessthroughoutstatistics,isthepolynomialbasisthatgeneratesTaylorseries.Assigningequalweighttosubsamplepointswithineachgivenpost-stratumyieldsn−mlinearconstraintsontheweights,andforcingtheweightstosumto1providesoneadditionalconstraint.Supplementingthesewithafurtherm−1linearconstraints7
ontheweights(andalsowithmildconditionsontheposteriordistributionandsimulationmethodtoguaranteeuniqueness)definestheweights.Theweightsarequicklyobtainedasasolutiontoasystemofmlinearequations.Thisversionofpost-stratificationhasproventobeextremelyeffectiveinpractice.Asanexampleofthistechnique,supposethatθisap-dimensionalvectorvaluedparameterandthattheparameterspaceΘispartitionedintom=3strata,namelyΘ1,Θ2andΘ3.Post-stratificationassignsthesameweightvjtoallsubsamplepointsbelongingtostratumΘj,wherej=1,2,3.Conditionalontheeventthatnostratumisemptyofsubsamplepoints,thiscorrespondston−3linearlyindependentconstraintsontheweights.Anadditionalconstraintontheweightsisthattheysumto1.Thustwootherindependentlinearconstraintswillensurethattheweightsareunique.Forachoiceoftworealfunctions,sayh1(θ)andh2(θ),ascomponentsofthebenchmarkfunction,thebenchmarkestimatesaregivenbyEˆ[h1(θ)]f=(1/N)iN=1h1(θ(ki)),andEˆ[h2(θ)]f=(1/N)iN=1h2(θ(ki)).ForPPj=1,2,3andl=1,2,letnj=in=1Ij(θ(ki))denotethenumberofsubsamplepointsfallinginstratumPΘj,andletbljdenotethesumofthefunctionhl(θ)evaluatedatthesubsamplepointsbelongingtostratumΘj.Thusnblj=hl(θ(ki))Ij(θ(ki)).X1=iThen,solvingthefollowingsystemoflinearequationsuniquelydeterminestheweights(v1,v2,v3)forthethreestrata,providedthesquarematrixbelowisinvertible:n1n2n3v11b11b12b13v2=Eˆ[h1(θ)]f.b21b22b23v3Eˆ[h2(θ)]fMaximumentropyweights.Informationtheorydescribes,invariousfashions,theamountofinformationindataaboutaparameterordistribution.InaBayesiancontext,itisoftenusedtodescribesubjectiveinformation(playingtheroleofdata)inordertoelicitapriordistribution.Thisisaccomplishedbyspecifyinganumberoffeaturesofthedistribution,typicallyexpectations,asthe“information”abouttheprior.Theprioristhenchosentoreflectthisinformationbutnomore.With8
entropydefinedasthenegativeofinformation,thepriorwhichreflectsexactlythedesiredinformationisthatwhichmaximizesentropyamongthosepriorsmatchingtheconstraints.Inoursetting,weborrowthistechnique,matchingexactlytheinformationinthefullsamplebench-markestimates,butnomore.Letw=(w1,w2,...,wn)bethen-tupleofweightsgivenin(3).LetusdenotebyΩthe(possiblyempty)setofallweightn-tuplesthatsatisfy(4).ThusΩistheset{w|wi≥0∀i,in=1wi=1,in=1wih(θ(ki))=Eˆ[h(θ)]f}.PPDefinition2.1Theentropyofann-tuplewbelongingtosetΩisdefinedasnEn(w)=−wilnwi,X1=isubjecttotheconventionthat0ln(0)equals0.WeobservethatforallwbelongingtoΩ,En(w)≤Enn,n,...,n=ln(n).SinceΩisclosed,¡¡111¢¢thereexistsanelementw∗ofΩsuchthatEn(w∗)=supw∈ΩEn(w).Theseweightsw∗arecalledmaximumentropyweights,andtheyexistwheneverΩisnon-empty.Findingmaximumentropyweightsw∗isthusequivalenttomaximizingEn(w)subjecttothecon-straintswi≥0fori=1,...,n,in=1wi=1,andin=1wih(θ(ki))=Eˆ[h(θ)]f.ForarealbenchmarkPPfunctionh(θ),itcanbeshownthatthemaximumentropyweightsw∗areuniquewhenevertheyexist.Formostsubsamplesofreasonablesizetheyaregivenbywi∗=eλ1+λ2h(θ(ki)),i=1,2,...n;)6(whereλ2∈Rsatisfiestheequationh(θ(ki))−Eˆ[h(θ)]fexpλ2h(θ(ki))−Eˆ[h(θ)]f=0,(7)Xn³´³³´´1=i´³andλ1=−lnPin=1eλ2h(θ(ki)).Thefewsubsampleswheretheweightsfailtoexistwillhaveeitherh(θ(ki))<Eˆ[h(θ)]fforalli,orh(θ(ki))>Eˆ[h(θ)]fforalli.Forallothersubsamples,equation(7)hasauniquerootbecausethelefthandsideoftheequationincreasesmonotonicallyfrom−∞to+∞as9
λ2increases.Therootcanbeobtainedbynumericalmethods,andhencewi∗canbecalculatedveryquicklyusing(6).Whenh(θ)isanindicatorfunctionofasubsetoftheparameterspaceΘ,theanswerobtainedusing(6)matchesthepost-stratificationweightsgivenin(5).Thus,thetwoapproacheswillsometimesyieldthesameresult.3TheoreticalResultsThemotivationforsubsamplingtheoutputofaMarkovchaincarriesovertothestudyofasymptoticpropertiesoftheestimators.Afirstasymptoticismotivatedbythecommonpracticeofusingprelim-inaryrunsoftheMarkovchaintoassessthedependenceandconvergencepropertiesofthechain,andthenselectingasubsamplingratefortherunofthechainusedforestimation.Thispracticeleadsustoconsidertheasymptoticwherekisheldfixedandngrows(CaseA).Thestrongestmotivationforsub-samplingisthatfurtheruseofthesubsampleinvolvesexpensive(slow)processing.Whenthisfurtherprocessingisdoneinrealtime,aswithinvestigationofchangesinthepriordistributionorlikelihood,itisessentialtolimitthenumberofpointsthatarerepeatedlyprocessed.Pursuingthismotivationforsubsampling,anaturalasymptoticholdsthenumberofsubsampledpoints,n,fixedwhilelettingtheintervalbetweensubsampledpoints,k,tendtoinfinity(CaseB).Inthelimit,thesesubsampledpointswilllooklikearandomsamplefromπ.Wenotethatasymptoticsbetweenthesetwo,wherebothnandkgrow(CaseC),arealsonaturalcandidatesfortheoreticalexploration,andareusefulforfillingouttherangeofasymptoticexpressionsusefulforassessingtheaccuracyofthebenchmarkestimators.Tierney(1994)containsacollectionofusefulresultsfortheasymptoticsofestimatorsbasedonoutputfromaMarkovchain.Thetwoessentialtypesofresultareergodictheoremswhichguaranteestrongconvergenceofanempiricalaverage(orfull-sampleestimator)toacorrespondingexpectationunderthelimitingdistribution,andcentrallimittheoremswhichdescribeweakconvergenceofanappropriatelycenteredandscaledfull-sampleestimatortoanormaldistribution.Werelyheavilyon01
hisresultstoshowasymptoticnormalityofoursubsampledestimators.CaseA.Wefirstconsidertheasymptoticwherekisheldfixedandntendsto∞.TheergodictheoreminTierney’spaperappliedtothesub-sampledchain(whichisitselfMarkovian)allowsustoconcludethat,asntendsto∞,Eˆ[g(θ)]stendstoE[g(θ)]almostsurely.Thenexttheoremestablishestheasymptoticnormalityofthebasicpost-stratificationestimatorforexpectationsofboundedfunctionsg(θ)andgeometricallyergodicMarkovchains.Theorem3.1SupposethattheMarkovchainisgeometricallyergodicwithinvariantdistributionπ.Ifthefunctiong(θ)isboundedandnotalinearcombinationofthestrataindicatorsIj(θ),j=1,...,m,thenthereexistsarealnumberσ(g)suchthat,asn→∞,thedistributionof√nEˆ[g(θ)]w,ps−E[g(θ)]´³convergesweaklytoanormaldistributionwithmean0andvarianceσ2(g).Proof.TheresultfollowsfromaverificationoftheconditionsofTheorem4ofTierney’spaperandbyanapplicationofthedeltamethod.SeeAppendixAforamoredetailedproof.UnderthestrongerassumptionofuniformergodicityoftheMarkovchain,anidenticalresultholdsforallfunctionsg(θ)withfiniteposteriorvariance.AproofofthisresultreliesonTheorem5ofTierney’spaper,butisotherwisethesameastheoneabove.Theresultsextendbeyondthebasicpost-stratificationestimator,applyingalsotothemorecom-plexpost-stratificationestimators.Theproofofthefollowingcorollaryformodifiedpost-stratificationestimatorsisalmostidenticaltothatoftheprevioustheorem.Itdiffersonlyinminordetailsspecifictomodifiedpost-stratificationestimators.Non-singularityofthematrixBdefinedbelowisrequiredforlocalcontinuityoftheestimator.Corollary3.2LetEˆ[g(θ)]∗w,psdenotethemodifiedversionofthepost-stratificationestimatorbasedon11