**Organisation:** CS course number: 16:198:674:01, CBMB course number: 16:118:617:03. Tuesday & Thursday, 1:40-3:00pm, Hill 262.

**NOTE:** This course is designed at the 500 level for first-year graduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms. No biology background is required.

**Description:** The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to reconstructing the sequence of genome modifications leading to cancerous growth of cells.

**Topics covered:**

- Sequence comparison: pair-wise sequence alignments
- Multiple sequence alignments
- Models for protein families: profile Hidden Markov Models
- Evolutionary models
- Phylogenetic Trees
- Signals in sequences: Gene regulation
- Gene prediction
- Sequence assembly
- Genome rearrangements
- Sequence comparisons for special cases: high similarity matches using index structures
- Algorithms for next-generation sequencing: *-Seq

Further topics might be added if time permits.

In this course we will introduce the necessary theory, the relevant algorithmic developments, and, through hands-on projects, practical aspects of solving small bioinformatics problems. An emphasis is put on recent developments in the field and on showing the interplay between the algorithmic development and the statistical modeling driven by the biological question at hand. We will introduce, respectively revisit, dynamic programming, shortest path algorithms, trees, string searching using index structures, multinomial distributions, Markov chains, Hidden Markov Models, the Maximum-Likelihood principle, Bayesian statistics, and Markov Chain Monte Carlo.

**Prerequisites:** Elementary algorithms, linear algebra, discrete math and probability theory. Note that all probability theory required will be reviewed. No biology background is required.

**Grading:** The course will consist of instructor's lectures, graded homework problems, and class projects. Grades depend on active participation in lectures, graded homework, class projects and written exams.

**Textbook:** The course will use "Bioinformatics" by Andrzej Polanksi and Marek Kimmel (Springer 2007) as the primary textbook.

**Course Website:** Further information about the course and course materials will be published on the Sakai website.