The methods we develop advance the understanding of gene function by analysis of complex, heterogeneous experimental data including data from imaging, for example from in situ gene expression experiments. Computer vision methods combined with statistical models help to find functional modules in Drosophila development by their spatial and temporal co-expression patterns. Novel tree-models elucidate regulatory mechanisms in the development of the lymphoid system.
We advance the theory of classical bioinformatics tools such as Hidden Markov Models and mixture models and apply them to novel data. Our focus is on semi-supervised learning, that is learning from labeled and unlabeled data as one way to fuse different sources of data, and flexible, robust models with minimal number of parameters (CSI), which nevertheless agree well with the biological reality.
The data created by next generation sequencing platforms poses challenges for implementing computational pipelines, for devising appropriate methods for analysis and for scaling up statistically advanced approaches, e.g. Bayesian methods. In collaboration with groups at Rutgers, CINJ and CWI we analyze NGS data and develop efficient algorithms for Bayesian approaches. With teaching (Bioinformatics for next-generation sequencing, Introduction to Bioinformatics) and organisation of a DIMACS workshops on NGS in 2010 we also help to build and educate a community here at Rutgers.
Computational thinking is becoming a core requirement across disciplines. Teaching computational and algorithmic ideas can benefit greatly from software tools. We develop animation systems for graph algorithms and clustering algorithms; CATBox is a Springer textbook using Gato. Learners can concentrate on tackling exciting bioinformatics problems with our Hidden Markov Model library.