9 Pages
English

Graph partition strategies for generalized mean field inference

Gain access to the library to view online
Learn more

Informations

Published by
Reads 82
Language English
Graph partition strategies for generalized mean eld inference Eric P. Xing Michael I. Jordan Stuart Russell Computer Science Division Computer Science and Statistics Computer Science Division University of California University of California University of California Berkeley, CA 94720 Berkeley, CA 94720 Berkeley, CA 94720 Abstract raise a new question|how are the clusters to be cho- sen? Until now, this issue has generally been left in the hands of the algorithm designer; moreover, the de-An autonomous variational inference algo- signer has been provided with little beyond intuitionrithm for arbitrary graphical models requires for making these choices. For some graphical model ar-the ability to optimize variational approxi- chitectures, there are only a few natural choices, andmations over the space of model parameters these can be explored manually. In general, however,as well as over the choice of tractable fam- we wish to envisage a general piece of software forilies used for the variational approximation. variational inference which can be asked to performIn this paper, we present a novel combina- inference for an arbitrary graph. In this setting, it istion of graph partitioning algorithms with a essential to begin to explore automatic methods forgeneralized mean eld (GMF) inference algo- choosing clusters.rithm. This combination optimizes over dis- joint clustering of variables and performs in- In the setting of structured mean eld algorithms (Jor- ference using those clusters. We provide a dan et al., 1999, Ghahramani and Jordan, 1997) it is formal analysis of the relationship between meaningful to consider disjoint clusters, and in Xing the graph cut and the GMF approximation, et al. (2003) we have proposed a generalized mean eld and explore several graph partition strategies (GMF) algorithm for inference based on this assump- empirically. Our empirical results provide tion, noting that the assumption of disjoint clusters rather clear support for a weighted version leads to a simple set of inference equations that can of MinCut as a useful clustering algorithm be easily implemented. Disjoint clusters have another for GMF inference, which is consistent with virtue as well, which is the subject of the current the implications from the formal analysis. paper|they open the door to a role for graph par- titioning algorithms in choosing clusters for inference. There are several intuitions that support a possi-1 Introduction ble role for graph partitioning algorithms in the au- tonomous choice of clusters for graphical model in-What are the prospects for fully autonomous algo- ference. The rst is that minimum cuts are to berithms for variational inference in graphical mod- preferred, so that as much as possible of the proba-els? Recent years have seen an increasingly sys- bilistic dependence is captured within clusters. It alsotematic treatment of an increasingly exible range seems likely that the values of parameters should mat-of algorithms for variational inference. In particu- ter because they often re ect the ?coupling strength?lar, the cluster v framework has provided of the probabilistic dependences among random vari-a range of algorithms that extend the basic \belief ables. Another intuition is that maximum cuts shouldpropagation" framework (Yedidia et al., 2000). Sim- be preferred, because in this case the mean eld act-ilarly, general clusters of variables are also allowed ing across a large cut may have an expectation that isin recent treatments of structured mean eld algo- highly concentrated, a situation which corresponds torithms (Wiegerinck, 2000). Empirical results have the basic assumption underlying mean eld methods.shown that both kinds of generalization can yield more Again, speci c values of parameters should matter.e ectiv e algorithms. In this paper we provide a preliminary formal analysisWhile these developments provide much needed exi- and a thoroughgoing empirical exploration of these is-bility for the design of e ectiv e algorithms, they also sues. We present a theorem that relates the weight of tion function). Given a disjoint variable partitioning, the graph cut to the quality of the bound of GMF ap- C, the true cluster conditional of each variable cluster proximation, and study random graphs and a variety C given its Markov blanket MB is:i i of settings of parameter values. We compare several p(X jX = x )/C MB MBdi eren t kinds of partitioning algorithms empirically. i i in oX XAs we will show, our results turn out to provide rather exp   (X ) +   (X ;x ) ; D D \C D \MB i iclear support for a clustering algorithm based on min- D C D B i i imal cut, which is consistent with implications drawn (2) from the formal analysis. A more general version of the where B denotes the set of cliques that intersect withGMF algorithm that allows non-factorizable potentials i but are not contained in cluster C , and where theis also provide in this paper and possible extensions i lower-case character x denotes a speci c assignment tomotivated by our formal analysis are discussed. variable X. Without loss of generality, we assume that all the potentials are positively weighted (i.e.,  > 0)2 The generalized mean eld (GMF) and the signs are subsumed in the potential functions. algorithm Given a clique D , let I denote the set of indices of clusters that have non-empty intersection with D .