4 Pages
English

KERNEL INDEPENDENT COMPONENT ANALYSIS

Gain access to the library to view online
Learn more

Description

Niveau: Supérieur, Doctorat, Bac+8
KERNEL INDEPENDENT COMPONENT ANALYSIS Francis R. Bach Computer Science Division University of California Berkeley, CA 94720, USA Michael I. Jordan Computer Science Division and Department of Statistics University of California Berkeley, CA 94720, USA ABSTRACT We present a class of algorithms for independent component anal- ysis (ICA) which use contrast functions based on canonical cor- relations in a reproducing kernel Hilbert space. On the one hand, we show that our contrast functions are related to mutual infor- mation and have desirable mathematical properties as measures of statistical dependence. On the other hand, building on recent de- velopments in kernel methods, we show that these criteria can be computed efficiently. Minimizing these criteria leads to flexible and robust algorithms for ICA. We illustrate with simulations in- volving a wide variety of source distributions, showing that our algorithms outperform many of the presently known algorithms. 1. INTRODUCTION Recent research on kernel methods has yielded important new com- putational tools for solving large-scale, nonparametric classifica- tion and regression problems [10]. While some forays have also been made into unsupervised learning, there is still much unex- plored terrain in problems involving large collections of mutually interacting variables, problems in which Markovian or general graphical models have excelled.

  • real random variable

  • perform well

  • based measures

  • infomax algorithm

  • algorithm involves

  • kernel independent

  • latent random vector

  • matrix ?i

  • eigenvalue problem


Subjects

Informations

Published by
Reads 34
Language English
KERNEL INDEPENDENT COMPONENT ANALYSIS
Francis R. Bach
Computer Science Division University of California Berkeley, CA 94720, USA fbach@cs.berkeley.edu
ABSTRACT We present a class of algorithms for independent component anal-ysis (ICA) which use contrast functions based on canonical cor-relations in a reproducing kernel Hilbert space.On the one hand, we show that our contrast functions are related to mutual infor-mation and have desirable mathematical properties as measures of statistical dependence.On the other hand, building on recent de-velopments in kernel methods, we show that these criteria can be computed efciently.Minimizing these criteria leads to exible and robust algorithms for ICA. We illustrate with simulations in-volving a wide variety of source distributions, showing that our algorithms outperform many of the presently known algorithms.
1. INTRODUCTION Recent research on kernel methods has yielded important new com-putational tools for solving large-scale, nonparametric classica-tion and regression problems [10].While some forays have also been made into unsupervised learning, there is still much unex-plored terrain in problems involving large collections of mutually interacting variables, problems in which Markovian orgeneral graphical models have excelled.These latter models in fact have several limitations that invite kernel-based initiatives; in particular, they are almost entirely based on strong parametric assumptions, and lack the nonparametric exibility of the kernel approaches. Independent component analysis (ICA)[8] is an interesting unsupervised learning problem in which to explore these issues. On the one hand, ICA is heavily based on structural assumptionsŠ viewed as a graphical model it is a directed bipartite graph linking a set of fisource nodesfl to a set of fiobservation nodes,fl in which the lack of edges between the source nodes encodes an assumption of mutual independence.On the other hand, the ICA problem is also strongly nonparametricŠthe distribution of the source vari-ables is left unspecied. This is difcultto accommodate within the (current) graphical model formalism, in which all nodes must be endowed with a probability distribution.It is here that we will ndkernel methods to be useful. We will show how kernel meth-ods can be used to denea ficontrast functionfl that can be used to estimate the parametric part of the ICA model (the source-to-observation edges), despite the absence of a specicdistribution on the source nodes.As we will see, compared to current ICA algorithms, the new kernel-based approach is notable for its ro-bustness. We refer to our new approach to ICA as fiKERNELICA.fl It is important to emphasize at the outset that KERNELICA is not the
Michael I. Jordan
Computer Science Division and Department of Statistics University of California Berkeley, CA 94720, USA jordan@cs.berkeley.edu
fikernelizationfl of an extant ICA algorithm.Rather, it is a new approach to ICA based on novel kernel-based measures of depen-dence. Weintroduce two such measures.In Section 3, we dene a kernel-based contrast function in terms of the rsteigenvalue of a certain generalized eigenvector problem, and show how this function relates to probabilistic independence.In Section 4.3, we introduce an alternative kernel-based contrast function based on the entire spectrum of the generalized eigenvector problem, and show how this function can be related to mutual information.
2. BACKGROUNDON ICA
Independent component analysis (ICA) is the problem of recover-> ing a latent random vectorx= (x1, . . . , xm)from observations ofmunknown linear functions of that vector. The components of xThus, an observationare assumed to be mutually independent. > y= (y1, . . . , ym)is modeled asy=Ax, wherexis a latent random vector with independent components, and whereAis an m×mmatrix of parameters.GivenNindependently, identically distributed observations ofy, we hope to estimateAand thereby to recover the latent vectorxcorresponding to any particularyby solving a linear system. By specifying distributions for the componentsxi, one ob-tains a parametric model that can be estimated via maximum like-1 lihood [5]. Working withW=Aas the parameterization, one readily obtains a gradient or xed-point algorithm that yields an ˆ estimateWand provides estimates of the latent components via ˆ ˆx=W y[8]. In practical applications, however, one does not generally know the distributions of the componentsxi, and it is preferable to view the ICA model as asemiparametric modelin which the distribu-tions of the components ofxare left unspecied[6]. Maximiz-ing the likelihood in the semiparametric ICA model is essentially equivalent to minimizing the mutual information between the com-ˆ ponents of the estimateˆx=W yit is natural to view[7]. Thus mutual information as acontrast functionto be minimized in esti-mating the ICA model. Unfortunately, the mutual information for real-valued variables is difcultto approximate and optimize on the basis of a nitesam-ple, and much research on ICA has focused on alternative contrast functions [8, 7, 1].These have either been derived as expansion-based approximations to the mutual information, or have had a looser relationship to the mutual information, essentially borrow-ing its key property of being equal to zero if and only if the ar-guments to the function are independent.In this paper, we dene