Department of Computer Science
 Rutgers University

Home page

Home page  Contact us  Site map 

 

 

Fall 2015

C: Introduction to Discrete Structures I

Organisation: CS course number: 01:198:205, Sections 01, 02, 03

Prerequisites: 01:198:111 and 01:640:152. Credit not given for this course and 14:332:312. Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.

Description: Provides the background in logic, mathematical proof techniques and combinatorics required in Discrete Structures II and in the design and analysis of algorithms. Basic Set Notation, Propositional Logic, Truth Tables, Boolean Circuits. First-Order Logic, Predicates, Quantifiers. Mathematical Induction: Program Correctness, Trees, Grammars. Relations: Closures of relations. Orders, Equivalence Relations, Functions Finite-State Machines.

Expected work: Weekly assignments, 2 midterms, quizzes and active participations using iclicker (version one with multiple choices answers A-E suffices), final exam

Textbook: Adapted version of Rosen: Discrete Math and its Applications (McGraw Hill, most recent edition).

Course Website: More information at http://bioinformatics.rutgers.edu/Teaching/F15DiscreteStructuresI/. There is also a Sakai site for participants

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

Spring 2015

C: Introduction to Bioinformatics

Organisation: CS course number: 16:198:671:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:20pm. Room: Hill 260.

Class starts 1/22

NOTE: This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes from sequencing data.

Course Website: http://bioinformatics.rutgers.edu/Teaching/S15IntroToBioinformatics

Contact: Alexander Schliep (schliep@cs.rutgers.edu). Additional lecturers: Alexander Schliep.

 

Fall 2014

C: Introduction to Discrete Structures II

Organisation: CS course number: 01:198:206, Sections 01, 02, HN

Prerequisites: 01:198:205 or 14:332:202; 01:640:152. Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.

Description: Provides the background in combinatorics and probability theory required in design and analysis of algorithms, in system analysis, and in other areas of computer science. Counting: Binomial Coefficients, Permutations, Combinations, Partitions. Recurrence Relations and Generating Functions. Discrete Probability: Events and Random Variables; Conditional Probability, Independence; Expectation, Variance, Standard Deviation; Binomial, Poisson and Geometric Distributions; law of large numbers. Some Topics from Graph Theory: Paths, Components, Connectivity, Euler Paths, Hamiltonian Paths, Planar Graphs, Trees.

Expected work: Weekly assignments, 2 midterms, quizzes and active participations using iclicker (version one with multiple choices answers A-E suffices), final exam

Textbook: Sheldon Ross: A first course in Probability (Prentice Hall, most recent edition). Note that earlier editions should work too.

Course Website: More information at http://bioinformatics.rutgers.edu/Teaching/F14DiscreteStructuresII/. There is also a Sakai site for participants

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

Spring 2014

C: Introduction to Bioinformatics

Organisation: CS course number: 16:198:671:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:20pm. Room: Hill 260.

NOTE: This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes from sequencing data.

Course Website: http://bioinformatics.rutgers.edu/Teaching/S14IntroToBioinformatics

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

Fall 2013

C: Introduction to Discrete Structures II

Organisation: CS course number: 01:198:206, Sections 01, 02, HN

Prerequisites: 01:198:205 or 14:332:202; 01:640:152. Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.

Description: Provides the background in combinatorics and probability theory required in design and analysis of algorithms, in system analysis, and in other areas of computer science. Counting: Binomial Coefficients, Permutations, Combinations, Partitions. Recurrence Relations and Generating Functions. Discrete Probability: Events and Random Variables; Conditional Probability, Independence; Expectation, Variance, Standard Deviation; Binomial, Poisson and Geometric Distributions; law of large numbers. Some Topics from Graph Theory: Paths, Components, Connectivity, Euler Paths, Hamiltonian Paths, Planar Graphs, Trees.

Expected work: Weekly assignments, 2 midterms, quizzes and active participations using iclicker (version one with multiple choices answers A-E suffices), final exam

Textbook: Sheldon Ross: A first course in Probability (Prentice Hall, 9th edition). Note that earlier editions should work too.

Course Website: More information at http://bioinformatics.rutgers.edu/Teaching/F13DiscreteStructuresII/. There is also a Sakai site for participants

Contact: Alexander Schliep (schliep@cs.rutgers.edu). Additional lecturers: Alexander Schliep. Teaching assistants: John Wiedenhoeft.

 

Spring 2013

C: Introduction to Bioinformatics

Organisation: CS course number: 16:198:674:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:20pm. Room: Hill 264.

NOTE: This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes from sequencing data.

Course Website: http://bioinformatics.rutgers.edu/Teaching/S13IntroToBioinformatics

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

Spring 2012

C: Introduction to Bioinformatics

Organisation: CS course number: 16:198:674:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:30pm. Room: Hill 264.

NOTE: This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes from sequencing data.

Course Website: http://bioinformatics.rutgers.edu/Teaching/S12IntroToBioinformatics

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

Fall 2011

C: Introduction to Discrete Structures II

Organisation: CS course number: 01:198:206, Sections 01, 02, HN

Prerequisites: 01:198:205 or 14:332:202; 01:640:152. Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.

Description: Provides the background in combinatorics and probability theory required in design and analysis of algorithms, in system analysis, and in other areas of computer science. Counting: Binomial Coefficients, Permutations, Combinations, Partitions. Recurrence Relations and Generating Functions. Discrete Probability: Events and Random Variables; Conditional Probability, Independence; Expectation, Variance, Standard Deviation; Binomial, Poisson and Geometric Distributions; law of large numbers. Some Topics from Graph Theory: Paths, Components, Connectivity, Euler Paths, Hamiltonian Paths, Planar Graphs, Trees.

Expected work: Weekly assignments, 1 or 2 tests, Final Exam

Textbook: Sheldon Ross: A first course in Probability (Prentice Hall, 8th edition). Note that earlier editions should work too.

Course Website: More information at http://bioinformatics.rutgers.edu/Teaching/F11DiscreteStructuresII/. There is also a Sakai site for participants

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

Fall 2010

C: Introduction to Bioinformatics

Organisation: CS course number: 16:198:674:01, CBMB course number: 16:118:617:03.<br> Tuesday & Thursday, 1:40-3:00pm, Hill 262.

NOTE: This course is designed at the 500 level for first-year graduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms. No biology background is required.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to reconstructing the sequence of genome modifications leading to cancerous growth of cells.

Course Website: See http://bioinformatics.rutgers.edu/Teaching/F10IntroToBioinformatics

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

Spring 2010

S: Light seminar: Bioinformatics for next-generation sequencing

Organisation: Course number: 16:198:500:07, Tuesdays 12-1 pm in Room Hill 260

Description: Computational tools have become central to the modern development of molecular biology my using CS, mathematics, and statistics to help solve fundamental problems.

This seminar is intended as an introduction to the field, with an emphasis on recent developments.

Grading: Pass/fail based on attendance and presentation.

Note: Jointly with Kevin Chen (Genetics/BioMaPS). See Sakai Website for schedule and further information.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

Fall 2009

C: Hidden Markov Models in Biology

Organisation Course number: 16:198:672, Time: Thursdays 3:20-6:20 in Room: Hill 260 (BioMaPS seminar room)

Description: Hidden Markov Models (HMMs) are an important class of stochastic models which were first applied and popularized in the context of automatic speech recognition by Rutgers' distinguished faculty member Lawrence Rabiner and his colleagues at Bell labs.

In recent years they found wide-spread use in analysis of data from molecular biology. They constitute the state-of-the-art in searching for remote homolog protein sequences and, with mild extensions, in identifying genes (or other signals) in DNA sequences. In addition to these stochastic models of sequences of discrete symbols, continuous-valued sequences can be modeled. These arise for example from time-course experiments measuring gene expression during the cell-cycle, in response to stimuli or during the development of organisms. Similarly the analysis of genomic tiling array data for chromosomal aberrations, location of transcription factor binding sites or identification of expressed regions is a typical segmentation problem for which HMMs are very well suited. The popularity of HMMs is based on their ease in terms of creating complex models, their simple stochastic structure and their polynomial-time algorithms for all relevant operations. Nevertheless, there have been exciting recent developments to arrive at reasonable running time and memory requirements even for genome-sized data sets. Similarly, theoretical aspects of HMMs and their estimation or training are still an active area of research. In this course we will introduce the necessary theory, the relevant algorithmic developments, and, through hands-on projects using the GHMM (http://ghmm.org), some of the engineering aspects of solving computational biology problems with HMMs. An emphasis will be put on recent developments in the field.

For CS students: note that HMMs are applied to a wide range of non-biological problems from fault detection in computer systems, over handwriting recognition, to predicting crises in the middle East based on newsfeed-data.

Prerequisites: As I expect an interdisciplinary audience I will not impose strict course requirements. You will need some elementary algorithms, linear algebra, discrete math and probability theory background and some programming experience. Note that all probability theory required will be reviewed. For CS students: While the examples in the class are all from biology, I will make sure that they are accessible to students without any biology background.

Grading: Attendance and participation is expected and will count toward the grade. Other components are problem sets which will have to be handed in, class project(s) and a preparation of a term presentation based on an original research paper.

Contents: Review of elementary probability and information theory, Markov chains, positional weight matrices and sequence logos, Hidden Markov Models, forward/backward-algorithm, Viterbi and posterior decoding, Baum-Welch training (i.e., the Expectation Maximization algorithm), pair-wise sequence alignments, a probabilistic interpretation of alignment scores, analyzing DNA sequences: gene finding with labelled HMMs and k-best decoding, finding transcription factors binding sites, detecting remote homolog protein sequences with profile HMMs, Dirichlet priors, HMMs for continous-valued observations, analyzing gene expression time-courses with HMMs, mixture models, analyzing tiling DNA micorarray data. Depending on interests and class composition further topics include: distance functions between HMMs, identifyability, learning HMM topology, numerically stable implementations of the basic algorithms, efficient MCMC approaches for full Bayesian analysis with HMMs, memory-efficient versions of the Viterbi, algorithmic improvements for repetitive sequences.

Class website: See the Sakai Wiki for the schedule, class notes, individual chapters of (a) textbook and original papers

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

WS 2006/07

P: Applied Data Mining

Format: Blockveranstaltung 5.- 16.03.2007 ganztägig im PC-Pool, 3. Stock, MPI. 5 credits, Schwerpunktbereiche C und D.

Vorbesprechung: 8.2.2007, Raum 331, Turm 2, 3. Stock, MPI für Molekulare Genetik. 15:00-16:00 Uhr

Inhalt: In diesem praktischen Kompaktkurs wird den teilnehmenden Studierenden die Möglichkeit geboten, die in Seminaren bzw. Vorlesungen zur statistischen Mustererkennung bzw. zum Data Mining erworbenen Kenntnisse durch praktische Analyse exemplarischer molekularbiologischer Datensätze zu vertiefen und zu ergänzen. Der thematische Schwerpunkt sind Hidden-Markov- Modelle. Elementare Programmierkenntnisse (C, C++, Python) werden vorausgesetzt, der Schwerpunkt liegt aber auf dem Erlernen der Data Mining Methodik.Für die Implementation der behandelten Methoden stellt die Bibliothek GHMM (http://ghmm.org) die benötigten Algorithmen und Datenstrukturen zur Verfügung. Die verbleibenden Programmieraufgaben beziehen sich auf problemspezifische Adaptionen, Datenaufbereitung und Ergebnisvisualisierung. Während des ganztägigen Kompaktkurses sollen die Teilnehmer/innen ausserdem die Arbeit in einem Team kennenlernen. Ein Leistungsnachweis wird durch die Präsentation der Projektergebnisse erworben.

Voraussetzungen: Erfolgreiche Teilnahme an der Vorlesung "Algorithmische Bioinformatik"; Statistik-Kenntnisse.

Contact: Alexander Schliep (schliep@cs.rutgers.edu). Additional lecturers: Benjamin Georgi. Teaching assistants: Ivan G Costa, Janne Grunau.

 

WS 2005/06

S: Algebraic Statistics

Format: Bookseminar. Thursdays 14:00-16:00 SR, 3. Stock, MPI. 2 SWS, Nr. 19714 im FU-KVV. Anrechenbar in Schwerpunkt C und D.

Inhalt: In this seminar we will work through the book Algebraic Statistics for Computational Biology" by Lior Pachter and Bernd Sturmfels (eds.). The format will be a book-seminar in which all participants have to read the material every time and turns are taken with giving presentations; several talks per participant are to be expected. This seminar is targeted at students at the final Master's resp. graduate level and will not review elementary material. Prerequesites are statistics and algebra and applications of statistical models in bioinformatics.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

P: Applied Data Mining

Format: Blockveranstaltung 6.- 17.03.2006 ganztägig im PC-Pool, 3. Stock, MPI. 5 credits (Schwerpunktbereiche C und D).

Vorlesungen:

Vorbesprechung: 18.1.2006, Raum 331, Turm 2, 3. Stock, MPI für Molekulare Genetik. 15:00-16:00 Uhr

Inhalt: In diesem praktischen Kompaktkurs wird den teilnehmenden Studierenden die Möglichkeit geboten, die in Seminaren bzw. Vorlesungen zur statistischen Mustererkennung bzw. zum Data Mining erworbenen Kenntnisse durch praktische Analyse exemplarischer molekularbiologischer Datensätze zu vertiefen und zu ergänzen. Die thematischen Schwerpunkte sind Support-Vector-Maschinen und Hidden-Markov- Modelle. Elementare Programmierkenntnisse (C, C++, Python, R) werden vorausgesetzt, der Schwerpunkt liegt aber auf dem Erlernen der Data Mining Methodik.

Voraussetzungen: Erfolgreiche Teilnahme an einem der Seminare "Elements of Statistical Learning" oder "Clusteranalyse heterogener Daten", sowie erfolgreiche Teilnahme an der Vorlesung "Algorithmische Bioinformatik".

Details: Microarrays, Proteinstrukturen, Text Mining --- die Liste ist fortführbar. Das Erkennen komplexer Strukturen in hochdimensionalen Räumen ist ein immer wiederkehrendes Problem in der Bioinformatik. Support Vector Machines (SVM) sind eine junge aber sehr erfolgreiche Klassifikationsmethode, mit deren Hilfe viel Probleme erfolgreich bearbeitet werden können.

Ähnlich verhält es sich mit Hidden Markov Models (HMM). Auf Ihnen basieren z.B. die Standardmethoden für Gen-Vorhersage und das Auffinden homologer Proteinsequenzen. Neben zahlreichen Anwendungen in der Sequenzanalyse, werden Sie auch in für die Analyse von Microarrays eingesetzt.

Die erste Woche des Softwarepraktikums beschäftigt sich mit SVM für Regression und Klassifikation, die zweite Woche mit HMM und der Analyse --- Annotation und Klassifikation --- von biologischen Sequenzen.

Die beiden Blöcke haben analogen Aufbau. Am ersten Tag wird eine Einführung gegeben, am letzten Tag steht Zeit zum Schreiben eines Berichtes zur Verfügung. Jeweils von Dienstag bis Donnerstag werden morgens praktische Aspekte erläutert. Danach wird in kleinen praktischen Schritten auf ein für diesen Tag von den Teilnehmern selbständig zu bearbeitendes Problem hingeführt.

Die Aufgaben sind individuell zu bearbeiten. Die Ergebnisse werden in einem Bericht zusammengefasst. Gruppenarbeit ist erlaubt, die Beiträge Einzelner müssen jedoch klar gekennzeichnet sein. Am Ende jedes Blocks findet bei jedem Teilnehmer die Begutachtung der Implementierung einer der drei Aufgaben statt.

Aus der Güte der Begutachtung und des schriftlichen Berichtes ergeben sich die Note.

Voraussetzungen: Erfolgreiche Teilnahme an der Vorlesung "Algorithmische Bioinformatik"; Statistik-Kenntnisse.

Contact: Alexander Schliep (schliep@cs.rutgers.edu). Additional lecturers: Alexander Schliep, Benjamin Georgi. Teaching assistants: Janne Grunau.

V: Algorithmische Bioinformatik

Inhalt: Vorlesungen zu Hidden Markov Modellen (Einführung, Profil-HMMs) und Gene Finding im Rahmen der Vorlesung Algorithmische Bioinformatik (Reinert et al.)

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

SS 2005

S: Information theoretic methods in bioinformatics

Format: Book seminar. Th 14:00 - 16:00, SR, 3. Stock, MPI. 2 SWS (Anrechenbar in Schwerpunkt C und D)

Description: In this seminar we will review information theoretic appraches based on the book "Information Theory, Inference and Learning Algorithms" by David J.c. MacKay. The format will be a book-seminar in which all participants have to read the material every time and turns are taken with giving presentations. Depending on the number of participants several talks are to be expected. Towards the end we will focus on information theoretic approaches for HMMs using original literature, using the book-seminar format. This seminar is targeted at students at the final Master's resp. graduate level and will not review elementary material. Prerequesites are successful attendance of "Algorithmische Bioinformatik" and "Hidden Markov Models" or further advanced course work. A good working knowledge of HMMs and Bayesian statistics in general is required.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

P: Implementing information theoretic methods in bioinformatics

Format: Programming project after the end of the seminar. Grading based on resulting software. For Masters students (Anrechenbar in den Schwerpunkten C und D.).

Inhalt: In this practical course we will implement selected methods covered in the book seminar "Information theoretic methods in bioinformatics" and perform numerical experiments to further investigate theoretical observations. A particular emphasis will be put on information theoretic methods as applied to clustering problems and, more generally, in the context of HMMs. This practical course is targeted at students at the final Master's resp. graduate level. Prerequesites are successful attendance of "Algorithmische Bioinformatik" and "Hidden Markov Models" or further advanced course work. A good working knowledge of HMMs and Bayesian statistics in general is required. Solid programming experience in Python an C is required.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

WS 2004/05

V: Statistische Mustererkennung in der Bioinformatik mit HMMs

Format: Fr 12:00-14:00, Informatik 1.27. 2 SWS

Inhalt: Vorlesung an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

V: Analyse von DNA-Microarrays

Format: Di 10:00-12:00, Informatik 1.26. 2 SWS

Inhalt: Vorlesung an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

U: Analyse von DNA-Microarrays

Format: Di 14:00-15:00, Informatik 1.03. 1 SWS

Inhalt: Übung zur Vorlesung Analyse von DNA-Microarrays an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

U: Statistische Mustererkennung in der Bioinformatik mit HMMs

Format: Di 15:00-16:00, Informatik 1.03. 1 SWS

Inhalt: Übung zur Vorlesung Statistische Mustererkennung in der Bioinformatik mit HMMs an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

S: Problemstellungen der Bioinformatik

Format: Fr 10:00-12:00, Informatik 1.03. 2 SWS

Inhalt: Seminar an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

V: Algorithmische Bioinformatik

Format: Mo 12:00-14:00 HS 001 Arnimallee, Mi 12:00-14:00 SR 005 Takustrasse 9. 4 SWS

Inhalt: Vorlesungen zu Hidden Markov Modellen und Gene Finding im Rahmen der Vorlesung Algorithmische Bioinformatik (Vingron et al.)

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

SS 2004

S: Statistische Gruppentests

Format: Do 14:00 - 16:00, SR, 3. Stock, MPI. 2 SWS

Inhalt: Mit einem Gruppentest bezeichnet man einen Ansatz 'teure' Experimente an einzelnen Proben einzusparen, indem man Gruppen gleichzeitig testet. Dies findet z.B. Anwendung bei der Qualitätskontrolle, z.B. Tests auf HIV oder Hepatitits mittels PCR, von Blutkonserven. Gruppentests sind aber viel allgemeiner anwendbar, wie z.B. bei der Bestimmung von Haplotyp und Genotyp, Erstellung physikalischer Kartierungen und beim Design von DNA-chips. Die zugrundeliegende Theorie ist reizvoll, da sie diskrete Mathematik (Kombinatorik, Kodierungstheorie) und Statistik eng verknüpft.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

V: Statistische Mustererkennung in der Bioinformatik

Format: Do 10:00 - 12:00, SR 119 Arnimallee 3. 2 SWS

Inhalt: Hidden-Markov-Modelle (HMM) sind eine flexible Klasse statistischer Modelle insb. für biologische Sequenzen und Zeitreihen. Ausgehend von der klassischen Definition eines HMM werden wir anhand von Anwendungen in der Molekularbiologie Modellerweiterungen (z.B. Zustände höherer Ordnung, multi-variate Ausgaben) und Klassifizierungs- und Gruppierungsverfahren auf der Basis von HMMs vorstellen. Dies wird ergänzt um eine Einordnung der HMMs in die Hierarchie statistischer Modelle. Ein zusätzlicher Schwerpunkt liegt auf einer Darstellung effektiver Techniken für eine effiziente und numerisch stabile Implementation.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

S: Softwarepraktikum Statistische Mustererkennung

Format: ganztägig, PC Pool, 3. Stock, MPI. 12 cr

Inhalt: Aufbauend auf der GHMM-Bibliothek (http://ghmm.org) bietet das Praktikum die Möglichkeit an aktuellen Fragestellungen der Bioinformatik im Bereich der Analyse von DNA-Sequenzen zu arbeiten. Um gleichzeitig verschiedene Informationsquellen (z.B. Primär- und Sekundärstruktur bei Proteinen) in der Analyse nutzen zu können, ist es nötig, multi-variate, oder vektor-wertige, Ausgaben zu unterstützen. Auf dieser Basis werden wir ein Programm zum Auffinden von Genen in DNA-Sequenzen eukaryontischer Genome entwerfen und implementieren. Dabei gilt es, das biologische Problem zu modellieren, einen Lösungsansatz zu entwerfen und Erweiterungen bzw. Anpassungen an der Bibliothek vorzunehmen. Für Training und Evaluation sind geeignete biologische Datensätze zu erstellen. Während des Praktikums sollen Teilnehmer/innen die selbständige Arbeit als Team (zwei Teams mit je vier Teilnehmern) kennenlernen und Erfahrungen mit Methodik (Extreme Programming) und Softwarewerkzeugen (z.B. für Versionskontrolle, Tests und Dokumentationen) des Software Engineering sammeln.

Contact: Alexander Schliep (schliep@cs.rutgers.edu). Teaching assistants: Wasinee Rungsarityotin.

 

WS 2003/04

S: Clusteranalyse heterogener Daten

Format: Di 10:00 - 12:00, SR, 3. Stock, MPI. 2 SWS

Inhalt: Der Gegenstand des Seminars sind Verfahren, die es erlauben unterschiedliche Datentypen (heterogene Daten) auszunutzen, um die Robustheit und Aussagekraft bei der Analyse zu verbessern.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

S: Statistical Classification: Support Vector Machines and Generalized Linear Models

Format: Do 14:00 - 16:00, Raum 111, Arnimallee 2-6. 2 SWS

Inhalt: Gemeinsames Seminar mit Ehrhard Behrends und Peter Martus.

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

P: Applied Data Mining

Format: Ganztägig 1-12.03.2004, PC-Pool, 3. Stock, MPI. 2 SWS Schwerpunktbereiche C und D.

Inhalt: Gemeinsame Veranstaltung mit Knut Reinert sowie Dennis Kostka und Florian Markowetz.

In diesem praktischen Kompaktkurs wird den teilnehmenden Studierenden die Möglichkeit geboten, die in Seminaren bzw. Vorlesungen zur statistischen Mustererkennung bzw. zum Data Mining erworbenen Kenntnisse durch praktische Analyse exemplarischer molekularbiologischer Datensätze zu vertiefen und zu ergänzen. Die thematischen Schwerpunkte sind Support-Vector-Maschinen und Hidden-Markov- Modelle. Elementare Programmierkenntnisse (C, C++, Python, R) werden vorausgesetzt, der Schwerpunkt liegt aber auf dem Erlernen der Data Mining Methodik.

Voraussetzungen: Erfolgreiche Teilnahme an einem der Seminare "Elements of Statistical Learning" oder "Clusteranalyse heterogener Daten", sowie erfolgreiche Teilnahme an der Vorlesung "Algorithmische Bioinformatik".

Details: Microarrays, Proteinstrukturen, Text Mining --- die Liste ist fortführbar. Das Erkennen komplexer Strukturen in hochdimensionalen Räumen ist ein immer wiederkehrendes Problem in der Bioinformatik. Support Vector Machines (SVM) sind eine junge aber sehr erfolgreiche Klassifikationsmethode, mit deren Hilfe viel Probleme erfolgreich bearbeitet werden können.

Ähnlich verhält es sich mit Hidden Markov Models (HMM). Auf Ihnen basieren z.B. die Standardmethoden für Gen-Vorhersage und das Auffinden homologer Proteinsequenzen. Neben zahlreichen Anwendungen in der Sequenzanalyse, werden Sie auch in für die Analyse von Microarrays eingesetzt.

Die erste Woche des Softwarepraktikums beschäftigt sich mit SVM für Regression und Klassifikation, die zweite Woche mit HMM und der Analyse --- Annotation und Klassifikation --- von biologischen Sequenzen.

Die beiden Blöcke haben analogen Aufbau. Am ersten Tag wird eine Einführung gegeben, am letzten Tag steht Zeit zum Schreiben eines Berichtes zur Verfügung. Jeweils von Dienstag bis Donnerstag werden morgens praktische Aspekte erläutert. Danach wird in kleinen praktischen Schritten auf ein für diesen Tag von den Teilnehmern selbständig zu bearbeitendes Problem hingeführt.

Die Aufgaben sind individuell zu bearbeiten. Die Ergebnisse werden in einem Bericht zusammengefasst. Gruppenarbeit ist erlaubt, die Beiträge Einzelner müssen jedoch klar gekennzeichnet sein. Am Ende jedes Blocks findet bei jedem Teilnehmer die Begutachtung der Implementierung einer der drei Aufgaben statt.

Aus der Güte der Begutachtung und des schriftlichen Berichtes ergeben sich die Note.

Contact: Alexander Schliep (schliep@cs.rutgers.edu). Additional lecturers: Wasinee Rungsarityotin. Teaching assistants: Benjamin Georgi.

V: Algorithmische Bioinformatik

Format: 4 SWS, Mo 10:00-12:00 SR 005 Takustrasse 9, Mi 10:00-12:00 SR 005 Takustrasse 9

Inhalt: Vorlesungen zu Hidden Markov Modellen (Einführung, Profil-HMMs) und Gene Finding im Rahmen der Vorlesung Algorithmische Bioinformatik (Reinert et al.).

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

 

SS 2003

V: Statistische Mustererkennung in der Bioinformatik

Format: Do 10:00 - 12:00, SR 119 Arnimallee 3. 2 SWS

Contact: Alexander Schliep (schliep@cs.rutgers.edu). Teaching assistants: Wasinee Rungsarityotin.

S: Softwarepraktikum Statistische Mustererkennung

Format: 2 SWS, ganztägig, PC Pool, 3. Stock, MPI

Contact: Alexander Schliep (schliep@cs.rutgers.edu). Teaching assistants: Wasinee Rungsarityotin.

 

WS 2002/03

S: Markov Ketten

Format: 2 SWS, Fr 16:00-18:00, SR 059 Takustrasse 9

Inhalt: Gemeinsames Seminar mit Huisinga, Schütte und Vingron

Contact: Alexander Schliep (schliep@cs.rutgers.edu).

V: Algorithmische Bioinformatik

Format: 4SWS, Mo 10:00-12:00 SR 005, Takustrasse 9, Mi 10:00-12:00 SR 031, Arnimallee 2-6

Inhalt: Vorlesungen zu Hidden Markov Modellen (Einführung, Profil-HMMs) und Gene Finding im Rahmen der Vorlesung Algorithmische Bioinformatik (Vingron et al.)

Contact: Alexander Schliep (schliep@cs.rutgers.edu).