    54 Pages
English

# Apprentissage statistique, parcimonie et Langevin Monte Carlo

-

Learn all about the services we offer Description

IntroductionSparsity oracle inequalities(SOI)BIC and LASSODantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Apprentissage statistique, parcimonie et LangevinMonte CarloAlexandre TsybakovLaboratoire de Statistique, CREST ,Laboratoire de Probabilit´es et Mod`eles Al´eatoires,Universit´e Paris 6etCMAP, Ecole PolytechniqueDijon, le 26 novembre 2009Alexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Nonparametric regression model (ﬁxed design)dAssume that we observe the pairs (X ,Y ),...,(X ,Y )∈R ×R1 1 n nwhereY = f(X )+ξ , i = 1,...,n.i i idRegression function f :R →R is unknown2Errors ξ are independent GaussianN(0,σ ) random variables.idX ∈R are arbitrary ﬁxed (non-random) points.iWe want to estimate f based on the data (X ,Y ),...,(X ,Y ).1 1 n nAlexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Approximating function, dictionaryWe assume that there exists a function f (x) (known as a functionθof θ and x) such thatf ≈ fθfor some θ = (θ ,...,θ ).1 MPossibly M nAlexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO ...

Subjects

##### Formal sciences

Informations Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Apprentissage statistique, parcimonie et Langevin Monte Carlo
Alexandre Tsybakov
Laboratoire de Statistique, CREST , LaboratoiredeProbabilit´esetMod`elesAle´atoires, Universit´eParis6 et CMAP, Ecole Polytechnique
Dijon, le 26 novembre 2009
AlexandreTsybakov Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)
Nonparametric regression model (ﬁxed design)
Assume that we observe the pairs (X1,Y1), . . . ,(Xn,Yn)Rd×R where
Yi=f(Xi) +ξi,i= 1, . . . ,n.
Regression functionf:RdRis unknown Errorsξiare independent GaussianN(0, σ2) random variables. XiRdare arbitrary ﬁxed (non-random) points.
We want to estimatefbased on the data (X1,Y1), . . . ,(Xn,Yn).
AlexandresTbykavo Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)
Approximating function, dictionary
We assume that there exists a function fθ(x) (known as a function ofθandx) such that ffθ
for someθ= (θ1, . . . , θM).
lAxenard
PossiblyMn
esTbykavo Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)
Example: linear approximation, dictionary
Letf1, . . . ,fMbe a ﬁnitedictionary of functions,fj:RdR. We approximate the regression functionfby linear combination
M fθ(x) =Xθjfj(x) with weightsθ= (θ1, . . . , θM). j=1
We believe that
M f(x)Xθjfj(x) j=1
for someθ= (θ1, . . . , θM).
AlexandresTbykavo Introduction SparsityoracleinBeIqCuaalnitdieLs(ASSOSIO)Model, dictionary, approximation Dantzig selector and LASSO for linear regressionSparsity Sparse exponential weighting (SEW) Scenarios for linear approximation (LinReg) exists thereExact equality:θRMsuch that f= fθ=PjM=1θjfj (linear regression, with possiblyMnparameters); (NPReg)f1, . . . ,fMare the ﬁrstMfunctions of a basis (usually orthonormal) andMn, there existsθsuch thatffθis small:nonparametric estimation of regression; (Agg)aggregation of arbitrary estimators: in this casef1, . . . ,fM are preliminary estimators offbased on a training sample independent of the observations (X1,Y1), . . . ,(Xn,Yn); Weak learning, additive models etc. Alexandre Tsybakov Introduction Sparsity oracle inequalities(SOI) BIC and LASSOSMpoadrseilt,ydictionary, approximation Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Example: nonlinear approximation
We consider the generalized linear (or single-index) model
fθ(x) =G(θTx)
whereG:RRis a known (or unknown) function.
Thus, we believe that
f(x)G(θTx)
for someθ= (θ1, . . . , θM), possibly withMn.
AlexandreTsybakov Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity of a vector
Model, dictionary, approximation Sparsity
The number of non-zero coordinates ofθ:
M M(θ) =XI{θj6=0} j=1
The valueM(θ) characterizes thesparsityof vectorθRM: the smallerM(θ), the “sparser”θ.
AlexandresTbykavo Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity of the model
Model, dictionary, approximation Sparsity
Intuitive formulation of sparsity assumption:
f(x)fθ(“fis well approximated by fθ”)
where the vectorθ= (θ1, . . . , θM) is sparse:
lAxenardeT
M(θ)M.
ysabokv Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)
Sparsity and dimension reduction
b LetθOLS Letbe the ordinary least squares (OLS) estimator. fθbe linear result:approximation. Elementary
2σ2min(M,n) EkfθbOLSfkn2≤ kffθkn+n
for anyθRMwherek ∙ knis the empirical norm: v =u1Xnf2(Xi). kfkntni=1
AlexandresTbyakov Introduction Sparsity oracle inequalities(SOI) BIC and LASSOSMpoardseilt,ydictionary, approximation Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity and dimension reduction
For anyθRMthe “oracular” OLS that acts only on the relevant M(θ)<ncoordinates satisﬁes
EkfθobrOaLcSlefkn2≤ kffθkn2+σ2nM(θ).
This is only an OLS oracle, not an estimator. The set of relevant coordinates should be known.
AlexandreTsybakov Introduction Sparsity oracle inequalities(SOI) BIC and LASSOImplications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity oracle inequalities
Do there exist true estimators with similar behavior? Basic idea: b b b Choose some suitable data-driven weightsθ= (θ1, . . . , θM) and estimatefby
M fb(x) = fθb(x) =Xθbjfj(x). j=1
What to do when the approximation is non-lineabr (ex. G(θTx))? Should we also plug in an estimatorθ? b˜ ˜ Can we ﬁndθsuch thatf= fθborfdeﬁned in diﬀerently satisﬁes
Ekf˜fk2n.kffθkn2+σ2M(θ),θ? n
AlexandreTsybakov