12 Pages
English

Least Squares Optimization with L1 Norm Regularization

-

Gain access to the library to view online
Learn more

Description

Niveau: Supérieur, Doctorat, Bac+8
Least Squares Optimization with L1-Norm Regularization Mark Schmidt CS542B Project Report December 2005 Abstract This project surveys and examines optimization ap- proaches proposed for parameter estimation in Least Squares linear regression models with an L1 penalty on the regression coefficients. We first review linear regres- sion and regularization, and both motivate and formalize this problem. We then give a detailed analysis of 8 of the varied approaches that have been proposed for optimiz- ing this objective, 4 focusing on constrained formulations and 4 focusing on the unconstrained formulation. We then briefly survey closely related work on the orthogonal design case, approximate optimization, regularization parameter estimation, other loss functions, active application areas, and properties of L1 regularization. Illustrative implemen- tations of each of these 8 methods are included with this document as a web resource. 1 Introduction This report focuses on optimizing on the Least Squares objective function with an L1 penalty on the parameters. There is currently significant interest in this and related problems in a wide variety of fields, due to the appealing idea of creating accurate predictive models that also have interpretable or parsimonious representations. Rather than focus on applications or properties of this model, the main contribution of this project is an examination of a variety of the approaches that have been proposed for parameter esti- mation in this model. The bulk of this document focuses on surveying 8 differ- ent optimization strategies proposed in the literature for this problem.

  • negative variable

  • denois- ing works

  • problem

  • least squares

  • variable

  • l1 regularization

  • while l2

  • highly negative

  • kkt con

  • linear regression


Subjects

Informations

Published by
Reads 62
Language English
LeastSquaresOptimizationwithL1-NormRegularizationMarkSchmidtCS542BProjectReportDecember2005AbstractThisprojectsurveysandexaminesoptimizationap-proachesproposedforparameterestimationinLeastSquareslinearregressionmodelswithanL1penaltyontheregressioncoefcients.Werstreviewlinearregres-sionandregularization,andbothmotivateandformalizethisproblem.Wethengiveadetailedanalysisof8ofthevariedapproachesthathavebeenproposedforoptimiz-ingthisobjective,4focusingonconstrainedformulationsand4focusingontheunconstrainedformulation.Wethenbrieysurveycloselyrelatedworkontheorthogonaldesigncase,approximateoptimization,regularizationparameterestimation,otherlossfunctions,activeapplicationareas,andpropertiesofL1regularization.Illustrativeimplemen-tationsofeachofthese8methodsareincludedwiththisdocumentasawebresource.1IntroductionThisreportfocusesonoptimizingontheLeastSquaresobjectivefunctionwithanL1penaltyontheparameters.Thereiscurrentlysignicantinterestinthisandrelatedproblemsinawidevarietyofelds,duetotheappealingideaofcreatingaccuratepredictivemodelsthatalsohaveinterpretableorparsimoniousrepresentations.Ratherthanfocusonapplicationsorpropertiesofthismodel,themaincontributionofthisprojectisanexaminationofavarietyoftheapproachesthathavebeenproposedforparameteresti-mationinthismodel.Thebulkofthisdocumentfocusesonsurveying8differ-entoptimizationstrategiesproposedintheliteratureforthisproblem.Eachwillbeexaminedindetail,focusingontheircomparativestrengthsandweaknesses,andhoweachdealswithafunctionthathasnon-differentiablepoints.Afterex-aminingthesemethodsindetail,wewillbrieydiscusssev-eralcloselyrelatedissues.Theseincludeoptimizationinthespecialcaseofanorthogonaldesignmatrix,approximateoptimizationforlargeproblems,selectingappropriateval-uesfortheregularizationparameter,optimizingotherlossfunctionssubjecttoanL1penalty,severalactiveapplica-tionareas,andnallywediscussseveralinterestingproper-tiesofL1regularization.However,wewillrstbrieyreviewthelinearregressionproblem,theLeastSquaresapproach,L2andL1penalties,andnallyformalizetheproblembeingexamined.Wedothisreviewinordertomotivatetheinterestinthismodel,andtointroducethenotationthatwillbeusedconsistentlythroughoutthereport.Weemphasizethelatterpointbe-causemuchoftherelatedworkwasproducedindifferentareas,andthenotationusedisnotconsistentacrossworks.1.1LinearRegressionInthetypicalunivariatelinearregressionscenario,theinputisninstancesofaprocessdescribedbyp+1vari-ables,andweseektondalinearcombinationofaselectedpvariables(thefeatures)thatbestpredictstheremainingvariable(thetarget).WetypicallydenethedesignmatrixXasamatrixhavingnrowsandpcolumns,representingthepvariablesforninstances.Similarly,wedenethetar-getvectoryasacolumnvectoroflengthncontainingthecorrespondingvaluesofthetargetvariable.Theproblemcanthenbeformulatedasndinga‘good’valueforthelengthpcoefcientvectorwandthescalarbiastermw0inthefollowingmodel(takenoverifrom1ton):pXyi=w0+xijwj+²i(1)1=j²idenotesthenoise(orerror)termforinstancei,anditistypicallyassumedthatthevaluesof²ihavezeromean,andarebothindependentlyandidenticallydistributed.Thus,wetypicallydonotmodel²directly.However,wecannotsolveforwandw0directlygivenonlyXandy,because²isunknown.Inaddition,thenoisetermmaycausethesameorsimilarvaluesofxitoyielddifferentvalues.Wethusdenea‘loss’functionthatassesseshowwellagivensetofparameterswpredictsyfromX.