Context-Specific Independence Mixture Modelling for Protein Families

B. Georgi, J. Schultz and A. Schliep

In Knowledge Discovery in Databases: PKDD 2007, Springer Berlin / Heidelberg, Volume 4702/2007, 79–90, 2007.

Protein families can be divided into subgroups with functional differences. The analysis of these subgroups and the determination of which residues convey substrate specificity is a central question in the study of these families. We present a clustering procedure using thecontext-specific independencemixture framework using a Dirichlet mixture prior for simultaneous inference of subgroups and prediction of specificity determining residues based on multiple sequence alignments of protein families. Application of the method on several well studied families revealed a good clustering performance and ample biological support for the predicted positions. The software we developed to carry out this analysisPyMix - the Python mixture packageis available from