Skip to content
2000
Volume 11, Issue 3
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Rapidly developing next-generation sequencing technologies significantly promote metagenomics research, yet also present extreme challenges in the analysis of metagenomic data. Metagenomic samples can contain thousands of microbial species, thus, sequencing datasets can contain fragments from thousands of different genomes. Therefore, clustering the sequencing reads with their original genomes, namely, binning, is usually done to expedite further studies. Currently, binning methods are divided into two categories: supervised methods (which require reference genomes), and unsupervised methods (which do not). We present an unsupervised binning method that combines a novel sequence feature recognition method with a spectral clustering algorithm. The sequence feature is a hybrid of sequence correlation and sequence composition analyses. Simulation experiments, based on simulated and actual metagenomic datasets, suggest that the combination of sequence composition and an intrinsic correlation of oligonucleotides, both extracted from tetranucleotide analyses, performs better than any single feature. A spectral clustering algorithm, which is a high performance unsupervised clustering method, is also applied in our binning method. The method is available as an open source package called HSS-bin (Hybrid Sequence feature and Spectral clustering unsupervised metagenomic binning) at http://bioinfo.seu.edu.cn/HSS-bin/. We evaluated HSS-bin’s performance using both simulated and actual metagenomic datasets. Experimental results indicate that HSS-bin can handle metagenomic sequencing data with non-uniform species abundance, short sequences, and complex phylogenetic diversity with high accuracy. Our method performs well on actual metagenomic datasets and on datasets simulated from a complex metagenomic community.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/1574893611666151203222815
2016-07-01
2025-05-31
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/1574893611666151203222815
Loading

  • Article Type:
    Research Article
Keyword(s): Metagenomics; sequence features; spectral clustering; unsupervised binning
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test