*Syllabus*

**Organisation:** CS course number: 16:198:671:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:20pm. Room: Hill 260.
Instructor: Alexander Schliep. CS students please register for the CS course number and biologists for the CBMB course number.

**NOTE:** This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern applications of computer science in molecular biology and genetics, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

**Description:** The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes.

**Topics covered:**

- Sequence comparison: pair-wise sequence alignments
- Multiple sequence alignments
- Models for protein families: profile Hidden Markov Models
- Evolutionary models and Phylogenetic Trees
- Signals in sequences: Gene regulation
- Gene prediction
- Sequence assembly
- Sequence comparisons for special cases: high similarity matches using index structures
- Algorithms for next-generation sequencing: *-Seq

In this course we will introduce the necessary theory, the relevant algorithmic developments, and, through hands-on projects, practical aspects of solving small bioinformatics problems. An emphasis is put on recent developments in the field and on showing the interplay between the algorithmic development and the statistical modeling driven by the biological question at hand. We will introduce, respectively revisit, dynamic programming, shortest path algorithms, trees, string searching using index structures, multinomial distributions, Markov chains, Hidden Markov Models, the Maximum-Likelihood principle, Bayesian statistics.

**Prerequisites:** Elementary algorithms, linear algebra, discrete math and probability theory. Students are expected to be proficient in a programming language at least to the point of implementing matrix multiplication or dynamic programming. A grade of C or better in "Analyzing Numbers in Biology" (16:118:617:02; 01:694:420; 01:750:487:01) is sufficient for fulfilling the prerequisites. For CS students: CS206, CS344 and a programming class will suffice. No biology background is required.

**Grading:** The course will consist of instructor's lectures, graded homework problems, student presentations and class projects. Grades depend on active participation in lectures, graded homework, one or two class projects, a written midterm and a final project. Projects may be done in groups. As homework and projects will be concerned with analyzing biological data, it will be necessary

**Textbook:** There will be no textbook; we will use individual chapters from appropriate texts, class notes and original literature. Some suggested reference literature includes
- Statistical Methods in Bioinformatics, Ewens and Grant, Springer (2nd Ed, 2005). Note that a PDF of the book is available to Rutgers students for free from SpringerLink.
- Biological Sequence Analysis, Durbin et al, Cambridge (1998).
- Algorithms on Strings, Trees and Sequences, Gusfield, Cambridge (1997).

**Course Website:** Further information about the course and course materials will be published on the Sakai website accessible to registered students.

**Possible use of Students as Experimental Subjects in Research:** As Rutgers is a research university there is a possibility that by enrolling in this class you may be asked to participate in a research study. Participation in any such study will be optional and at no time will participation in a research study be part of a grade or a requirement for this course. This notification does not imply that by enrolling in this class you have provided consent to be a subject in a research study. Should you be asked to participate in a research study a consent form will be presented to you describing the study and asking for your signature. Participation in research is always voluntary and refusing to participate will have no adverse effects on your standing in the course. To learn more about research at Rutgers University and Human Subject research go to http://orsp.rutgers.edu/index.php?q=content/institutional-review-board-irb.