Apprentissage statistique, parcimonie et Langevin Monte Carlo
54 Pages
English

Apprentissage statistique, parcimonie et Langevin Monte Carlo

-

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description

IntroductionSparsity oracle inequalities(SOI)BIC and LASSODantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Apprentissage statistique, parcimonie et LangevinMonte CarloAlexandre TsybakovLaboratoire de Statistique, CREST ,Laboratoire de Probabilit´es et Mod`eles Al´eatoires,Universit´e Paris 6etCMAP, Ecole PolytechniqueDijon, le 26 novembre 2009Alexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Nonparametric regression model (fixed design)dAssume that we observe the pairs (X ,Y ),...,(X ,Y )∈R ×R1 1 n nwhereY = f(X )+ξ , i = 1,...,n.i i idRegression function f :R →R is unknown2Errors ξ are independent GaussianN(0,σ ) random variables.idX ∈R are arbitrary fixed (non-random) points.iWe want to estimate f based on the data (X ,Y ),...,(X ,Y ).1 1 n nAlexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO for linear regressionSparse exponential weighting (SEW)Approximating function, dictionaryWe assume that there exists a function f (x) (known as a functionθof θ and x) such thatf ≈ fθfor some θ = (θ ,...,θ ).1 MPossibly M nAlexandre TsybakovIntroductionSparsity oracle inequalities(SOI)Model, dictionary, approximationBIC and LASSOSparsityDantzig selector and LASSO ...

Subjects

Informations

Published by
Reads 72
Language English
Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Apprentissage statistique, parcimonie et Langevin Monte Carlo
Alexandre Tsybakov
Laboratoire de Statistique, CREST , LaboratoiredeProbabilit´esetMod`elesAle´atoires, Universit´eParis6 et CMAP, Ecole Polytechnique
Dijon, le 26 novembre 2009
AlexandreTsybakov
Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)
Nonparametric regression model (fixed design)
Assume that we observe the pairs (X1,Y1), . . . ,(Xn,Yn)Rd×R where
Yi=f(Xi) +ξi,i= 1, . . . ,n.
Regression functionf:RdRis unknown Errorsξiare independent GaussianN(0, σ2) random variables. XiRdare arbitrary fixed (non-random) points.
We want to estimatefbased on the data (X1,Y1), . . . ,(Xn,Yn).
AlexandresTbykavo
Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)
Approximating function, dictionary
We assume that there exists a function fθ(x) (known as a function ofθandx) such that ffθ
for someθ= (θ1, . . . , θM).
lAxenard
PossiblyMn
esTbykavo
Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)
Example: linear approximation, dictionary
Letf1, . . . ,fMbe a finitedictionary of functions,fj:RdR. We approximate the regression functionfby linear combination
M fθ(x) =Xθjfj(x) with weightsθ= (θ1, . . . , θM). j=1
We believe that
M f(x)Xθjfj(x) j=1
for someθ= (θ1, . . . , θM).
AlexandresTbykavo
Introduction SparsityoracleinBeIqCuaalnitdieLs(ASSOSIO)Model, dictionary, approximation Dantzig selector and LASSO for linear regressionSparsity Sparse exponential weighting (SEW) Scenarios for linear approximation (LinReg) exists thereExact equality:θRMsuch that f= fθ=PjM=1θjfj (linear regression, with possiblyMnparameters); (NPReg)f1, . . . ,fMare the firstMfunctions of a basis (usually orthonormal) andMn, there existsθsuch thatffθis small:nonparametric estimation of regression; (Agg)aggregation of arbitrary estimators: in this casef1, . . . ,fM are preliminary estimators offbased on a training sample independent of the observations (X1,Y1), . . . ,(Xn,Yn); Weak learning, additive models etc. Alexandre Tsybakov
Introduction Sparsity oracle inequalities(SOI) BIC and LASSOSMpoadrseilt,ydictionary, approximation Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Example: nonlinear approximation
We consider the generalized linear (or single-index) model
fθ(x) =G(θTx)
whereG:RRis a known (or unknown) function.
Thus, we believe that
f(x)G(θTx)
for someθ= (θ1, . . . , θM), possibly withMn.
AlexandreTsybakov
Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity of a vector
Model, dictionary, approximation Sparsity
The number of non-zero coordinates ofθ:
M M(θ) =XI{θj6=0} j=1
The valueM(θ) characterizes thesparsityof vectorθRM: the smallerM(θ), the “sparser”θ.
AlexandresTbykavo
Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity of the model
Model, dictionary, approximation Sparsity
Intuitive formulation of sparsity assumption:
f(x)fθ(“fis well approximated by fθ”)
where the vectorθ= (θ1, . . . , θM) is sparse:
lAxenardeT
M(θ)M.
ysabokv
Introduction Sparsity oracle inequalities(SOI)Model, dictionary, approximation DantzigselectorandLASSOforlBiInCeaarnrdegLreAsSsiSoOnSparsity Sparse exponential weighting (SEW)
Sparsity and dimension reduction
b LetθOLS Letbe the ordinary least squares (OLS) estimator. fθbe linear result:approximation. Elementary
2σ2min(M,n) EkfθbOLSfkn2≤ kffθkn+n
for anyθRMwherek ∙ knis the empirical norm: v =u1Xnf2(Xi). kfkntni=1
AlexandresTbyakov
Introduction Sparsity oracle inequalities(SOI) BIC and LASSOSMpoardseilt,ydictionary, approximation Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity and dimension reduction
For anyθRMthe “oracular” OLS that acts only on the relevant M(θ)<ncoordinates satisfies
EkfθobrOaLcSlefkn2≤ kffθkn2+σ2nM(θ).
This is only an OLS oracle, not an estimator. The set of relevant coordinates should be known.
AlexandreTsybakov
Introduction Sparsity oracle inequalities(SOI) BIC and LASSOImplications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW)
Sparsity oracle inequalities
Do there exist true estimators with similar behavior? Basic idea: b b b Choose some suitable data-driven weightsθ= (θ1, . . . , θM) and estimatefby
M fb(x) = fθb(x) =Xθbjfj(x). j=1
What to do when the approximation is non-lineabr (ex. G(θTx))? Should we also plug in an estimatorθ? b˜ ˜ Can we findθsuch thatf= fθborfdefined in differently satisfies
Ekf˜fk2n.kffθkn2+σ2M(θ),θ? n
AlexandreTsybakov