Genome assembly is one of the fundamental problems in Bioinformatics. Assembly can be either reference guided--when we have a reference genome that is similar to the genome we want to assemble--or de novo - when the genome is reconstructed only from reads available from sequencing machines. With sequencing getting cheaper by the day, researchers are interested in assembling genomes of more and more organisms. The main bottleneck here is the lack of reliable de novo assembly tools for Next Generation Sequencing data (the cheaper but shorter reads). We wish to investigate various aspects of the de novo assembly problem such as read filtering and correcting, contig building, scaffolding, etc.
In collaboration with Prof. Debashish Bhattacharya of SEBS, Rutgers, we explored the effectiveness of single cell assembly tools to produce a draft genome assembly of an unknown wild-caught marine diatom. We showed that if the genomic material is largely free of contaminants, we may reliably perform the organism's phylogenetic and evolutionary analysis, protein prediction and annotation and metabolic pathway analysis. Currently, we are exploring the possibility of performing a similar evolutionary analysis of Picobiliphytes (a recently discovered group of algae).
Whole genome amplified (WGA) single cell (SC) sequencing data is notorious for large coverage variation and errors. The frequent k-mer observation problem can be viewed as a generalized `Coupon collecting' problem where coupons appear with probabilities following a certain distribution. Since reads contain more errors towards the end, the rate of false frequent k-mers increases with increasing read length. Our goal is to predict the False Discovery Rate (FDR) of the observed frequent k-mers for a particular prefix of the read (or partial read) and thereby suggest a prefix length for a given value of k for k-mer based downstream analysis like assembly. Working with partial reads provides the possibility of performing preliminary analysis even before the sequencing is complete. This will facilitate rapid pathogen detection for diseases where the ability to rapidly administer the correct antimicrobial drug has a profound effect on patient outcome.
Bhattacharya, Debashish and Roy, Rajat S. and Price, Dana C. and Schliep , Alexander. Studying the single life of eukaryotic microbes: Single cell genomics of marine plankton (2014) [details]
Roy, Rajat S. and Bhattacharya, Debashish and Schliep , Alexander. Turtle: Identifying frequent k-mers with cache-efficient algorithms (2014) [details]
Roy, Rajat S. and Price, Dana C. and Schliep, Alexander and Cai, Guohong and Korobeynikov, Anton and Yang, Eun Chan and Bhattacharya, Debashish. Single cell genome analysis of an uncultured heterotrophic stramenopile (2014) [details]
Roy, Rajat Shuvro and Chen, Kevin and Sengupta, Anirvan and Schliep, Alexander. SLIQ: Simple Linear Inequalities for Efficient Contig Scaffolding (2012) [details]