Department of Computer Science
 Rutgers University

Home page

Home page  Contact us  Site map 




Context-specific Independence Mixture Models for Cluster Analysis of Biological Data

B. Georgi

Ph.D. Thesis, Freie Universit├Ąt Berlin, Jun 2009.

Clustering is a crucial first step in the exploratory analysis of biological data. This thesis is concerned with cluster analysis of biological data using mixture models. Mixture models is a class of powerful and versatile statistical models. We develop an extension to the conventional mixtures in form of the context-specific independence (CSI) framework. CSI mixtures are particularly suited for the analysis of biological data since they perform robustly in the presence of noise and uninformative features in the data. This is achieved by adapting the model complexity to the degree of variation observed in a given data set. We present a learning algorithm for CSI mixtures in a Bayesian framework. We apply CSI mixture clustering on data sets of transcription factor binding sites, protein sequences and complex disease phenotype data.