36 Pages
English

# Submitted to the Annals of Statistics arXiv: math PR

Description

Niveau: Supérieur, Doctorat, Bac+8
Submitted to the Annals of Statistics arXiv: math.PR/0000000 NEEDLES AND STRAWS IN A HAYSTACK: POSTERIOR CONCENTRATION FOR POSSIBLY SPARSE SEQUENCES By Ismael Castillo ? and Aad van der Vaart We consider full Bayesian inference in the multivariate normal mean model in the situation that the mean vector is sparse. The prior distribution on the vector of means is constructed hierarchically by first choosing a collection of nonzero means and next a prior on the nonzero values. We consider the posterior distribution in the frequen- tist set-up that the observations are generated according to a fixed mean vector, and are interested in the posterior distribution of the number of nonzero components and the contraction of the posterior distribution to the true mean vector. We find various combinations of priors on the number of nonzero coefficients and on these coeffi- cients that give desirable performance. We also find priors that give suboptimal convergence, for instance Gaussian priors on the nonzero coefficients. We illustrate the results by simulations. 1. Introduction. Suppose that we observe a vector X = (X1, . . . , Xn) in Rn such that (1.1) Xi = ?i + ?i, i = 1, . . . , n, for independent standard normal random variables ?i and an unknown vector of means ? = (?1, . . . , ?n). We are interested in Bayesian inference on ?, in the situation that this vector is possibly sparse.

• square

• inf ?

• gaussian sequence

• such full

• bayesian inference

• pin decrease

• full posterior

• rate over

Subjects

##### Bayesian inference

Informations

SubmittedtotheAnnalsofStatistics
arXiv:
math.PR/0000000

NEEDLESANDSTRAWSINAHAYSTACK:POSTERIOR
CONCENTRATIONFORPOSSIBLYSPARSESEQUENCES
WeconsiderfullBayesianinferenceinthemultivariatenormal
meanmodelinthesituationthatthemeanvectorissparse.Theprior
distributiononthevectorofmeansisconstructedhierarchicallyby
ﬁrstchoosingacollectionofnonzeromeansandnextaprioronthe
nonzerovalues.Weconsidertheposteriordistributioninthefrequen-
tistset-upthattheobservationsaregeneratedaccordingtoaﬁxed
meanvector,andareinterestedintheposteriordistributionofthe
numberofnonzerocomponentsandthecontractionoftheposterior
distributiontothetruemeanvector.Weﬁndvariouscombinations
ofpriorsonthenumberofnonzerocoeﬃcientsandonthesecoeﬃ-
cientsthatgivedesirableperformance.Wealsoﬁndpriorsthatgive
suboptimalconvergence,forinstanceGaussianpriorsonthenonzero
coeﬃcients.Weillustratetheresultsbysimulations.

1.Introduction.
Supposethatweobserveavector
X
=(
X
1
,...,X
n
)
in
R
n
suchthat
(1.1)
X
i
=
θ
i
+
ε
i
,i
=1
,...,n,
forindependentstandardnormalrandomvariables
ε
i
andanunknownvector
ofmeans
θ
=(
θ
1
,...,θ
n
).WeareinterestedinBayesianinferenceon
θ
,in
thesituationthatthisvectorispossibly
sparse
.
Non-Bayesianapproachestothisproblemhaverecentlybeenconsidered
bymanyauthors.Golubev[13]obtainedresultsformodelselectionmethods
andthresholdestimatorsforthemean-squaredrisk.Birge´andMassart[4]
treatedthemodelwithintheirgeneralcontextofmodelselectionbypenal-
izedleastsquares.Abramovichetal.in[1]studiedtheperformanceofthe
FalseDiscoveryRatemethod.TheearlierworkbyDonohoandJohnstone
[10]canbeviewedasstudyingtheproblemwithinan
`
r
context.Manyau-
thors(seee.g.[3],[22],[21]andreferencescitedthere)haveinvestigatedthe
connectiontotheLASSOorsimilarmethods.
MethodswithaBayesianconnectionwerestudiedbyGeorgeandFoster
[12],Zhang[20],JohnstoneandSilverman[16,17],Abramovich,Grinshtein

WorkpartlysupportedbyaPostdoctoralfellowshipfromtheVUUniversityAmster-
Primary62G05,62G20
Keywordsandphrases:
Bayesianestimators,Sparsity,Gaussiansequencemodel,Mix-
turepriors,Asymptotics,Contraction.
1imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

2
I.CASTILLOANDA.W.VANDERVAART
andPensky[2],andJiangandZhang[15].Thepapers[12]and[16]con-
sideredanempiricalBayesmethod,consistingofmodellingtheparameters
θ
1
,...,θ
n
sureat0andacontinuousdistribution,determininganappropriatemixing
weightbythemethodof(restricted)marginalmaximumlikelihood,and
ﬁnallyemployingtheposteriormedianormean.Thesecondpaper[2]moti-
vatedpenalties,appliedinapenalizedminimumcontrastscheme,byprior
distributionsontheparameters,andderivedestimatorsforthenumberof
nonzero
θ
i
andthe
θ
i
itself.Theﬁrstisaposteriormode,buttheestimator
for
θ
,called“Bayesiantestimation”,doesnotseemitselfBayesian.(Infact,
theGaussianpriorforthenon-zeroparametersin[2]willbeseentoperform
suboptimallyinourfullyBayesianset-up.)Thepapers[20]and[15]obtain
sharpresultson(nonparametric)empiricalBayesestimators.
Otherrelatedpapersinclude[19],[6],[7],[14],[15],[5].
Apenalizedminimumcontrastestimatorcanoftenbeviewedasthemode
ingly.However,theBayesianapproachyieldsafullposteriordistribution,
whichisarandomprobabilitydistributionontheparameterspace.Ithas
distributionsforanyfunctionsoftheparametervectorofinterest.Itisthis
objectthatwestudyinthispaper.SuchfullBayesianinferencewasrecently
consideredbyScottandBerger[18],whodiscussedvariousaspectsnotcov-
eredinthepresentpaper,butnoconcentrationresults.Oneexampleofour
resultsisthatthebeta-binomialpriorsin[18],combinedwithmoderately
toheavytailedpriorsonthenonzeromeans,yieldoptimalrecovery.
Sparsity
canbedeﬁnedinvariousways.Perhapsthemostnaturaldeﬁni-
tionistheclassof
nearlyblack
vectors,deﬁnedas
`
0
[
p
n
]=
{
θ

R
n
:#(1

i

n
:
θ
i
6
=0)

p
n
}
.
Here
p
n
isagivennumber,whichintheoreticalinvestigationsistypically
assumedtobe
o
(
n
),as
n
→∞
.Sparsitymayalsomeanthatmanymeans
aresmall,butpossiblynotexactlyzero.Deﬁnitionsthatmakethisprecise
use
strong
or
weak
`
s
-balls
,typicallyfor
s

(0
,
2).Thesearedeﬁnedas,with
θ
[1]

θ
[2]
≥∙∙∙≥
θ
[
n
]
thenonincreasingpermutationofthecoordinatesof
θ
=(
θ
1
,...,θ
n
),
nX
n
o
`
s
[
p
n
]=
θ

R
n
:1
|
θ
i
|
s

p
ns
nn1=in
n
1
s

p
n

s
o
m
s
[
p
n
]=
θ

R
:
n
1

m
i
a

x
n
i
|
θ
[
i
]
|≤
n.
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

SPARSITYANDBAYESPOSTERIORMEASURE
3
Becausethenonzerocoeﬃcientsin
`
0
[
p
n
]arenotquantitativelyrestricted,
thereisnoinclusionrelationshipbetweenthisspaceandtheweakandstrong
balls,althoughresultsforthelattercanbeobtainedbyprojectingthem
into
`
0
[
p
n
].Ontheotherhand,forany
s>
0wehavetheinclusion
`
s
[
p
n
]

m
s
[
p
n
].
Theextentofthesparsity,measuredbytheconstant
p
n
,isassumedun-
known.OurBayesianapproachstartsbyputtingaprior
π
n
onthisnumber,
agivenprobabilitymeasureontheset
{
0
,
1
,
2
,...,n
}
.Nextwecomplete
thistoaprioronthesetofallpossiblesequences
θ
=(
θ
1
,...,θ
n
)in
R
n
,
p
from
π
n
choosingarandomsubset
S
⊂{
1
,...,n
}
of
cardinality
p
,andchoosingthecorrespondingcoordinates(
θ
i
:
i

S
)from
g
S
on
R
S
andsettingtheremainingcoordinates(
θ
i
:
i

S
c
)equal
tozero.Giventhisprior,Bayes’ruleyieldstheposteriordistributionof
θ
asusual.Weinvestigatethepropertiesofthisposteriordistribution,inits
dependenceonthepriorsonthedimensionandonthenonzerocoeﬃcients,
inthenonBayesianset-upwhere
X
follows(1.1)with
θ
equaltoaﬁxed,
“true”parameter
θ
0
.
Ifthetrueparametervector
θ
0
belongsto
`
0
[
p
n
],thenitisdesirablethat
theposteriordistributionconcentratesmostofitsmassonnearlyblack
vectors.Onemainresultofthepaperisthatthisisthecaseprovidedthe
priorprobabilities
π
n
{
p
}
decreaseexponentiallyfastwiththedimension
p
.
Thequalityofthereconstructionofthefullvector
θ
canbemeasuredby
variousdistances.AnaturaloneistheEuclideandistance,withsquare
nXk
θ

θ
0
k
2
=(
θ
i

θ
i
0
)
2
.
1=iIftheindicesofthe
p
n
nonzerocoordinatesofavectorinthemodel
`
0
[
p
n
]
wereknowna-priori,thenthevectorcouldbeestimatedwithmeansquare
erroroftheorder
p
n
.In[11]itisshownthat,as
n,p
n
→∞
with
p
n
=
o
(
n
),
2in
ˆ
fsup
P
n,θ
k
θ
ˆ

θ
k
=2
p
n
log(
n/p
n
)1+
o
(1)
.
θθ

`
0
[
p
n
]
Heretheinﬁmumistakenoverallestimators
θ
ˆ=
θ
ˆ(
X
)and
P
n,θ
denotes
takingtheexpectationundertheassumptionthat
X
is
N
n
(
θ,I
)-distributed.
Inotherwords,thesquareminimaxrateover
`
0
[
p
n
]is
p
n
log(
n/p
n
),meaning
logarithmicloss.
bypriormodelling,andisnotnecessarilydirectedatattainingminimax

imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

4
I.CASTILLOANDA.W.VANDERVAART
rates.However,fortheoreticalinvestigationitisnaturaltotakethemin-
imaxrateasabenchmark,anditisofparticularinteresttoknowwhich
priorsyieldaposteriordistributionthatconcentratesmostofitsmasson
ballsaround
θ
0
p
n
log(
p
n
/n
),orcloserelativesas
p
n
(log
n
)
r
thatloose(only)alogarithmicfactor.Asecondmainresultof
thepaperisthattheminimaxrateisattainedformanycombinationsof
priors.Itsuﬃcesthatthepriors
π
n
decreaseexponentiallywithdimension,
andgivesuﬃcientweighttothetruelevelofsparsity:forsome
c>
0,
(1.2)
π
n
(
p
n
)
&
exp

cp
n
log(
n/p
n
)
.
Furthermore,thepriorsonthenonzerocoordinatesshouldhavetailsthat
arenotlighterthanLaplace,andsatisfyanumberofothertechnicalprop-
erties.Ifinequality(1.2)failsthentherateofcontr

actionm

aybeslower
thanminimax;weshowthatitisnotslowerthanlog1

n
(
p
n
).(Theword
“contraction”isinlinewithotherliteratureonnonparametricBayesianpro-
cedures;withthepresentchoiceofmetrics(whichgrowwith
n
)therates
actuallyincreasetoinﬁnity.)
Moregenerally,weconsiderreconstructionrelativetothe
`
q
metricfor
0
<q

2,deﬁned(without
q
th-root)by
nX(1.3)
d
q
(
θ,θ
0
)=
|
θ
i

θ
0
i
|
q
.
1=iFor
q<
2this“metric”ismoresensitivetosmallvariationsinthecoordi-
natesthanthesquareEuclideanmetric,whichis
d
2
.(For
q

1thedeﬁnition
givesatruemetric
d
q
;for1
<q

2itdoesnot.)From[11]theminimax
rateover
`
0
[
p
n
]for
d
q
isknowntobeoftheorder
(1.4)
r
n

,q
=
p
n
log
q/
2
(
n/p
n
)
.
Weshowthattheposterior“contraction”rateattainsthisorderundercon-
ditionsasintheprecedingparagraph,andmoregenerallycharacterizethe
rateintermsoflog(1

n
(
p
n
)).
Besidesnearlyblackvectorsweconsiderratesofcontractionif
θ
0
isina
weak
`
s
-ball.Theminimaxrateover
m
s
[
p
n
]relativeto
d
q
isgivenby(see
)]01[sp(1.5)
µ
n

,s,q
=
n
n
log
(
q

s
)
/
2
(
n/p
n
)
.
nThisisshowntobealsotherateofposteriorcontractionunderslightly
strongerconditionsonthepriorsthanbefore:thepriorondimensionmust
imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

SPARSITYANDBAYESPOSTERIORMEASURE
5
decreaseslightlyfasterthanexponential.Underthesameconditionswe
alsoshowthattheposteriordistributionhasexponentialconcentration,and
thereforecontractsalsointhestrongersenseof(any,Euclidean)moments.
Asummaryoftheseresultsisthatgoodpriorsforthedimensiondecrease
atexponentialor,perhapsbetter,slightlyfasterrate,andgoodpriorson
thenonzeromeanshavetailsthatareheavierthanLaplace.Wealsoshow
thatpriorswithlightertails,suchastheGaussian,attainsigniﬁcantlylower
contractionratesattrueparametervectors
θ
0
thatarenotclosetothe
origin.
Thestructureofthearticleisasfollows.InSection2westatethemain
concentrationresults.Apracticalalgorithm,simulationsandsomepictures
arepresentedinSection3.Proofsaregatheredattheendofthepaperand
inthesupplementaryAppendix[9].
1.1.
Notation.
Wedenoteby
a

b
and
a

b
theminimumandmaximum
oftworealnumbers
a,b
,andwrite
a
.
b
if
a

Cb
forauniversalconstant
C
.Thenotation
,
means“equalbydeﬁnitionto”.Wecall
support
ofa
vector
θ
=(
θ
1
,...,θ
n
)

R
n
thesetofindicesofnonzerocoordinates,and
denotethisby
S
θ
=
{
i
∈{
1
,...,n
}
:
θ
i
6
=0
}
.Weset
θ
S
=(
θ
i
:
i

S
),and
let
|
S
|
bethecardinalityofaset
S
⊂{
1
,...,n
}
.
2.Mainresults.
ThroughoutthepaperweconsiderapriorΠ
n
on
R
n
constructedinthreesteps:
(P1)A
dimension
p
ischosenaccordingtoapriorprobabilitymeasure
π
n
ontheset
{
0
,
1
,
2
,...,n
}
.
(P2)Given
p
asubset
S

⊂{
1
,...,n
}
ofsize
|
S
|
=
p
ischosenuniformlyat
nrandomfromthe
p
subsetsofsize
p
.
(P3)Given(
p,S
)avector
θ
S
=(
θ
i
:
i

S
)ischosenfromaprobability
distributionwithLebesguedensity
g
S
on
R
p
(if
p

1)andthisis
extendedto
θ

R
n
bysettingtheremainingcoordinates
θ
S
c
equalto
.0Forsimplicityweusethesamedensity
g
S
foreverysetofagivendimension
|
S
|
,andwilldenotethisalsoby
g
|
S
|
.
GiventhepriorΠ
n
,Bayes’ruleyieldsthe
posteriordistribution
B
7→
Π
n
(
B
|
X
),theconditionaldistributionof
θ
given
X
iftheconditionaldistri-
butionof
X
given
θ
istakenequaltothenormaldistribution
N
n
(
θ,I
).The
probabilityΠ
n
(
B
|
X
)ofaBorelset
B

R
n
undertheposteriordistribution

imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

6
I.CASTILLOANDA.W.VANDERVAART
canbewritten
X
n
π
(
p
)
XZYY

nn

φ
(
X
i

θ
i
)
φ
(
X
i
)
g
S
(
θ
S
)

S
p
=0
p
|
S
|
=
p
(
θ
S
,
0)

Bi

Si

/S
(2.1)
X
n
π
(
p
)
XZYY
.

nn

φ
(
X
i

θ
i
)
φ
(
X
i
)
g
S
(
θ
S
)

S
p
=0
p
|
S
|
=
pi

Si

/S
Here(
θ
S
,
0)isthevectorin
R
n
θ
i
=0to
θ
S
=(
θ
i
:
i

S
),atthepositionsleftopenby
S
⊂{
1
,...,n
}
(inthecorrect
orderofthecoordinatesandnotattheend,asthenotationsuggests).This
expressionissomewhatunwieldy;weconsidercomputationinSection3.
Theposteriordistributioncanbemarginalizedtoobtaintheposteriordis-
tributionofanarbitrarytransformation
f
(
θ
)bychoosingtheset
B
equal
to
B
=
{
θ
:
f
(
θ
)

B
0
}
.
Theposteriordistributionisarandomprobabilitydistributionon
R
n
,
whichwestudyundertheassumptionthatthevector
X
=(
X
1
,...,X
n
)is
distributedaccordingtoamultivariatenormaldistributionwithmeanvector
θ
0
andcovariancematrixtheidentity.Welet
P
n,θ
0
T
denotetheexpected
valueofafunction
T
=
T
(
X
)underthisdistribution.
Weshallbeinterestedintwoaspectsoftheposteriordistribution:its
dimensionalityanditsabilitytorecoverthemeanvector
θ
.Becausethe
conditionsaresimplerinthecasethatthenonzerocoordinatesareindepen-
dentundertheprior,intheﬁrsttworesultsweassumethatthedensities
g
S
in(P3)areofproductform.Concreteexamplesofpriorsasin(P1)and(P3)
thatsatisfytheconditionsimposedinthetheoremsaregiveninSection2.5.
2.1.
Dimensionality.
Inthecontextof
`
0
[
p
n
]-classes,wesaythatthe
prior
π
n
ondimensionhas
exponentialdecrease
if,forsomeconstants
C>
0
and
D<
1,
(2.2)
π
n
(
p
)

n
(
p

1)
,p>Cp
n
.
Theorem
2.1(Dimension)
.
If
π
n
hasexponentialdecrease
(2.2)
and
g
S
isaproductof
|
S
|
copiesofaunivariatedensitywithmeanzeroandﬁnite
secondmoment,thenthereexists
M
suchthat,as
p
n
,n
→∞
,
sup
P
n,θ
0
Π
n
θ
:
|
S
θ
|
>Mp
n
|
X

0
.
θ
0

`
0
[
p
n
]
massinthe
p
n
-dimensionalsubspacethatsupportsatruemeanvector

imsart-aosver.2007/12/10file:spa-revised.texdate:November8,2011

SPARSITYANDBAYESPOSTERIORMEASURE
7
θ
0

`
0
[
p
n
].Thetheoremshowsthattheposteriordistribution“overshoots”
thisspacebysubspacesofdimensionatmostamultipleof
p
n
.Becausethe
overshootcanhavearandomdirection,thisdoesnotmeanthattheposte-
riordistributionconcentratesoverallonaﬁxed
Mp
n
-dimensionalsubspace.
Thetheoremshowsthatitconcentratesalong
Mp
n
-dimensionalcoordinate
planes,butitssupportwillbefarfromconvex.
Obviouslytheposteriordistributionwillconcentrateonlow-dimensional
π
n
.Bythetheoremexponentialdecreaseissuﬃcient.Thenextstepisto
showthatexponentialdecreaseisnottooharsh:itiscompatiblewithgood
reconstructionofthefullmeanvector
θ
.Thisthenofcourserequiresalower
boundonthepriormassgiventothespacesof“correct”dimension;for
instance(1.2).
2.2.
Recovery.
Goodrecoveryrequiresalsoappropriatepriordensities
g
S
onthenonzerocoordinates.Becausethestatisticalproblemofrecovering
θ
froma
N
p
(
θ,I
)distributedobservationisequivariantin
θ
,wemayhope
thatthelocationofthenonzerocoordinatesof
θ
0
doesnotplayaroleinits
recoveryrate.Thenon-Bayesianproceduresconsideredinforinstance[13]
indeedfulﬁllthisexpectation.However,aBayesianprocedure(withproper
priors)necessarilyfavourscertainregionsoftheparameterspace.Depending
onthechoiceofpriors
g
S
inthe“average”recoveryoftheparameteras
n
→∞
,yieldingsuboptimal
behaviourfortrueparameters
θ
0
thatarefarfromtheorigin.Thisshrinkage
eﬀectcanbepreventedbychoosingpriors
g
S
withsuﬃcientlyheavytails.
Againweﬁrstconsiderthecaseofindependentcoordinates.Inthefol-
lowingtheoremweassumethat
g
S
isaproductof
|
S
|
densitiesoftheform
e
h
,forafunction
h
:
R

R
satisfying
(2.3)

h
(
x
)

h
(
y
)