R.S. Roy, D.C. Price, A. Schliep, G. Cai, A. Korobeynikov, E.C. Yang and D. Bhattacharya
Sci Rep 2014, 4:4780.
A broad swath of eukaryotic microbial biodiversity cannot be cultivated in the lab and is therefore inaccessible to conventional genome-wide comparative approaches. How then do we gain knowledge about ecologically important protists to elucidate their genome evolution, their places in the tree of life (ToL), and the distribution of their gene homologs in marine metagenome data? One promising approach is single cell genomics (SCG), whereby an individual cell is captured from nature and genome data are produced from the amplified DNA. Although widely used to generate complete or near-complete bacterial genomes, application of SCG to eukaryote nuclear genome sequencing is poorly developed. Here we use SCG to generate the first draft genome assembly from a cell belonging to the broadly distributed group of MAST-4 uncultured marine stramenopiles. Using SCG analysis of a diatom with a sequenced genome as a control, we tested and deployed assembly and gene prediction methods to identify ca. 7,000 protein-encoding genes in the MAST-4 genome. With this inventory of protein-encoding genes, we were able to robustly position the marine stramenopile in theToL using multigene phylogenetics and to gain insights into its complex evolutionary history of horizontal gene transfer (HGT). MAST-4 proteins with different phylogenetic histories were then mapped to global ocean sampling (GOS) data. This revealed starkly different patterns of gene homolog distribution in the marine environment, implicating a role for natural selection in generating the patchy distribution.