Clustering metagenomic sequences with interpolated Markov models View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2010-11-02

AUTHORS

David R Kelley, Steven L Salzberg

ABSTRACT

BackgroundSequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects.ResultsWe present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHY SCIMM that performs better when evolutionarily close training genomes are available.ConclusionsSCIMM and PHY SCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHY SCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHY SCIMM are available open source from http://www.cbcb.umd.edu/software/scimm. More... »

PAGES

544

References to SciGraph publications

  • 1985-12. Comparing partitions in JOURNAL OF CLASSIFICATION
  • 2007-04-29. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods in NATURE METHODS
  • 2010-02-11. Viral and microbial community dynamics in four aquatic environments in THE ISME JOURNAL: MULTIDISCIPLINARY JOURNAL OF MICROBIAL ECOLOGY
  • 2009. Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information in RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY
  • 1997-04. Amelioration of Bacterial Genomes: Rates of Change and Exchange in JOURNAL OF MOLECULAR EVOLUTION
  • 2009-12. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea in NATURE
  • 2010. A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples in RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY
  • 2008-01-01. CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads in RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY
  • 2009-10-02. Unsupervised statistical clustering of environmental shotgun sequences in BMC BIOINFORMATICS
  • 2008-02-28. Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes in BMC GENOMICS
  • 2001-06. The Closest BLAST Hit Is Often Not the Nearest Neighbor in JOURNAL OF MOLECULAR EVOLUTION
  • 2009-08-02. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models in NATURE METHODS
  • 2004-02-01. Community structure and metabolism through reconstruction of microbial genomes from the environment in NATURE
  • 2006-12-10. Accurate phylogenetic classification of variable-length DNA fragments in NATURE METHODS
  • 2008-10-13. A simple, fast, and accurate method of phylogenomic inference in GENOME BIOLOGY
  • 2009-12. Exceptional structured noncoding RNAs revealed by bacterial metagenome analysis in NATURE
  • 2010-03. A human gut microbial gene catalogue established by metagenomic sequencing in NATURE
  • 2010-03-24. Alignment and clustering of phylogenetic markers - implications for microbial diversity studies in BMC BIOINFORMATICS
  • 2009-08-21. Community-wide analysis of microbial genome sequence signatures in GENOME BIOLOGY
  • 2009-10-21. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering in BMC GENOMICS
  • 2009-02-11. TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach in BMC BIOINFORMATICS
  • 2009-12-18. WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads in BMC BIOINFORMATICS
  • 2004-09. Genomic Conflict Settled in Favour of the Species Rather Than the Gene at Extreme GC Percentage Values in APPLIED BIOINFORMATICS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/1471-2105-11-544

    DOI

    http://dx.doi.org/10.1186/1471-2105-11-544

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1015822870

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/21044341


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Statistics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Cluster Analysis", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Databases, Factual", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Markov Chains", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Metagenomics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Pattern Recognition, Automated", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, DNA", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Computer Science, University of Maryland, A.V. Williams Building College Park, 20742, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.164295.d", 
              "name": [
                "Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, 20742, College Park, MD, USA", 
                "Department of Computer Science, University of Maryland, A.V. Williams Building College Park, 20742, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kelley", 
            "givenName": "David R", 
            "id": "sg:person.0627152136.41", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0627152136.41"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Computer Science, University of Maryland, A.V. Williams Building College Park, 20742, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.164295.d", 
              "name": [
                "Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, 20742, College Park, MD, USA", 
                "Department of Computer Science, University of Maryland, A.V. Williams Building College Park, 20742, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Salzberg", 
            "givenName": "Steven L", 
            "id": "sg:person.01223441713.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-3-642-02008-7_29", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1005288938", 
              "https://doi.org/10.1007/978-3-642-02008-7_29"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s002390010184", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045143895", 
              "https://doi.org/10.1007/s002390010184"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-10-430", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037735919", 
              "https://doi.org/10.1186/1471-2105-10-430"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-10-56", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029876223", 
              "https://doi.org/10.1186/1471-2105-10-56"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/pl00006158", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017555006", 
              "https://doi.org/10.1007/pl00006158"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2009-10-8-r85", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1014147708", 
              "https://doi.org/10.1186/gb-2009-10-8-r85"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth976", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007149601", 
              "https://doi.org/10.1038/nmeth976"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf01908075", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022323983", 
              "https://doi.org/10.1007/bf01908075"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth1043", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047202519", 
              "https://doi.org/10.1038/nmeth1043"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08656", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1013886837", 
              "https://doi.org/10.1038/nature08656"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2164-10-487", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1025594094", 
              "https://doi.org/10.1186/1471-2164-10-487"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08821", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050498034", 
              "https://doi.org/10.1038/nature08821"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-11-152", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027606868", 
              "https://doi.org/10.1186/1471-2105-11-152"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-10-316", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1025025744", 
              "https://doi.org/10.1186/1471-2105-10-316"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08586", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017136943", 
              "https://doi.org/10.1038/nature08586"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2008-9-10-r151", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023248704", 
              "https://doi.org/10.1186/gb-2008-9-10-r151"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature02340", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023089166", 
              "https://doi.org/10.1038/nature02340"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2164-9-104", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1018202992", 
              "https://doi.org/10.1186/1471-2164-9-104"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ismej.2010.1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1003110594", 
              "https://doi.org/10.1038/ismej.2010.1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-12683-3_35", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047449367", 
              "https://doi.org/10.1007/978-3-642-12683-3_35"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.2165/00822942-200403040-00003", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004526278", 
              "https://doi.org/10.2165/00822942-200403040-00003"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1358", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008886215", 
              "https://doi.org/10.1038/nmeth.1358"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-78839-3_3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004083256", 
              "https://doi.org/10.1007/978-3-540-78839-3_3"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2010-11-02", 
        "datePublishedReg": "2010-11-02", 
        "description": "BackgroundSequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects.ResultsWe present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHY SCIMM that performs better when evolutionarily close training genomes are available.ConclusionsSCIMM and PHY SCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHY SCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHY SCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/1471-2105-11-544", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.2529425", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2545461", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2519905", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1023786", 
            "issn": [
              "1471-2105"
            ], 
            "name": "BMC Bioinformatics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "11"
          }
        ], 
        "keywords": [
          "metagenomic sequences", 
          "environmental DNA", 
          "sequence clustering method", 
          "novel microbes", 
          "unknown microbes", 
          "metagenomic projects", 
          "same species", 
          "metagenomic sequencing", 
          "genome", 
          "microbial strains", 
          "public databases", 
          "SCIMM", 
          "microbes", 
          "reads", 
          "sequence", 
          "vast number", 
          "genus", 
          "sequencing", 
          "species", 
          "DNA", 
          "tremendous potential", 
          "available open source", 
          "hybrids", 
          "large set", 
          "complex datasets", 
          "previous unsupervised approaches", 
          "strains", 
          "environment", 
          "origin", 
          "biased sample", 
          "analysis steps", 
          "potential", 
          "step", 
          "number", 
          "effective approach", 
          "unknown origin", 
          "clustering method", 
          "dataset", 
          "source", 
          "approach", 
          "database", 
          "set", 
          "limitations", 
          "unsupervised learning", 
          "unsupervised approach", 
          "accurate method", 
          "open source", 
          "traditional methods", 
          "samples", 
          "Markov model", 
          "method", 
          "model", 
          "task", 
          "learning", 
          "output", 
          "accuracy", 
          "project"
        ], 
        "name": "Clustering metagenomic sequences with interpolated Markov models", 
        "pagination": "544", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1015822870"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/1471-2105-11-544"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "21044341"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/1471-2105-11-544", 
          "https://app.dimensions.ai/details/publication/pub.1015822870"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-10-01T06:36", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/article/article_521.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/1471-2105-11-544"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-11-544'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-11-544'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-11-544'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-11-544'


     

    This table displays all metadata directly associated to this object as RDF triples.

    247 TRIPLES      21 PREDICATES      111 URIs      80 LITERALS      13 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/1471-2105-11-544 schema:about N4ef1ff0d16ff4ac0aa4fd2d54ee386e0
    2 N6645dfa6ad3744f29762e5b6fa08c6eb
    3 N988c38075a884064ac6f29b74531471f
    4 Na72b81e03a2f4ee9bae56edbd0090435
    5 Nb019401768454ea3bdc1397591ecaa06
    6 Nf183cc7b4333494a9e9df1a3b4af8d65
    7 anzsrc-for:01
    8 anzsrc-for:0104
    9 schema:author Nfb9c1eaebe5f42efa2c81860208cafad
    10 schema:citation sg:pub.10.1007/978-3-540-78839-3_3
    11 sg:pub.10.1007/978-3-642-02008-7_29
    12 sg:pub.10.1007/978-3-642-12683-3_35
    13 sg:pub.10.1007/bf01908075
    14 sg:pub.10.1007/pl00006158
    15 sg:pub.10.1007/s002390010184
    16 sg:pub.10.1038/ismej.2010.1
    17 sg:pub.10.1038/nature02340
    18 sg:pub.10.1038/nature08586
    19 sg:pub.10.1038/nature08656
    20 sg:pub.10.1038/nature08821
    21 sg:pub.10.1038/nmeth.1358
    22 sg:pub.10.1038/nmeth1043
    23 sg:pub.10.1038/nmeth976
    24 sg:pub.10.1186/1471-2105-10-316
    25 sg:pub.10.1186/1471-2105-10-430
    26 sg:pub.10.1186/1471-2105-10-56
    27 sg:pub.10.1186/1471-2105-11-152
    28 sg:pub.10.1186/1471-2164-10-487
    29 sg:pub.10.1186/1471-2164-9-104
    30 sg:pub.10.1186/gb-2008-9-10-r151
    31 sg:pub.10.1186/gb-2009-10-8-r85
    32 sg:pub.10.2165/00822942-200403040-00003
    33 schema:datePublished 2010-11-02
    34 schema:datePublishedReg 2010-11-02
    35 schema:description BackgroundSequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects.ResultsWe present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHY SCIMM that performs better when evolutionarily close training genomes are available.ConclusionsSCIMM and PHY SCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHY SCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHY SCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.
    36 schema:genre article
    37 schema:isAccessibleForFree true
    38 schema:isPartOf N13592352c16642a28eee31b664d256b8
    39 N5d388bf06dd94bb8976ef028ba7efc1c
    40 sg:journal.1023786
    41 schema:keywords DNA
    42 Markov model
    43 SCIMM
    44 accuracy
    45 accurate method
    46 analysis steps
    47 approach
    48 available open source
    49 biased sample
    50 clustering method
    51 complex datasets
    52 database
    53 dataset
    54 effective approach
    55 environment
    56 environmental DNA
    57 genome
    58 genus
    59 hybrids
    60 large set
    61 learning
    62 limitations
    63 metagenomic projects
    64 metagenomic sequences
    65 metagenomic sequencing
    66 method
    67 microbes
    68 microbial strains
    69 model
    70 novel microbes
    71 number
    72 open source
    73 origin
    74 output
    75 potential
    76 previous unsupervised approaches
    77 project
    78 public databases
    79 reads
    80 same species
    81 samples
    82 sequence
    83 sequence clustering method
    84 sequencing
    85 set
    86 source
    87 species
    88 step
    89 strains
    90 task
    91 traditional methods
    92 tremendous potential
    93 unknown microbes
    94 unknown origin
    95 unsupervised approach
    96 unsupervised learning
    97 vast number
    98 schema:name Clustering metagenomic sequences with interpolated Markov models
    99 schema:pagination 544
    100 schema:productId Na878889f8d04433db41ebb7e79046ea9
    101 Naa010759962645b6a0f72ef8f67cf7b4
    102 Nb545f613203841cd88913783bb7c66e4
    103 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015822870
    104 https://doi.org/10.1186/1471-2105-11-544
    105 schema:sdDatePublished 2022-10-01T06:36
    106 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    107 schema:sdPublisher N0cdc38eb9ea94bf6aeb777a082ee8e43
    108 schema:url https://doi.org/10.1186/1471-2105-11-544
    109 sgo:license sg:explorer/license/
    110 sgo:sdDataset articles
    111 rdf:type schema:ScholarlyArticle
    112 N0cdc38eb9ea94bf6aeb777a082ee8e43 schema:name Springer Nature - SN SciGraph project
    113 rdf:type schema:Organization
    114 N13592352c16642a28eee31b664d256b8 schema:volumeNumber 11
    115 rdf:type schema:PublicationVolume
    116 N4ef1ff0d16ff4ac0aa4fd2d54ee386e0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    117 schema:name Databases, Factual
    118 rdf:type schema:DefinedTerm
    119 N5d388bf06dd94bb8976ef028ba7efc1c schema:issueNumber 1
    120 rdf:type schema:PublicationIssue
    121 N6645dfa6ad3744f29762e5b6fa08c6eb schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    122 schema:name Cluster Analysis
    123 rdf:type schema:DefinedTerm
    124 N988c38075a884064ac6f29b74531471f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    125 schema:name Sequence Analysis, DNA
    126 rdf:type schema:DefinedTerm
    127 Na72b81e03a2f4ee9bae56edbd0090435 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    128 schema:name Markov Chains
    129 rdf:type schema:DefinedTerm
    130 Na878889f8d04433db41ebb7e79046ea9 schema:name doi
    131 schema:value 10.1186/1471-2105-11-544
    132 rdf:type schema:PropertyValue
    133 Naa010759962645b6a0f72ef8f67cf7b4 schema:name pubmed_id
    134 schema:value 21044341
    135 rdf:type schema:PropertyValue
    136 Nb019401768454ea3bdc1397591ecaa06 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    137 schema:name Pattern Recognition, Automated
    138 rdf:type schema:DefinedTerm
    139 Nb06f382da5884141a58e19d38b667d04 rdf:first sg:person.01223441713.02
    140 rdf:rest rdf:nil
    141 Nb545f613203841cd88913783bb7c66e4 schema:name dimensions_id
    142 schema:value pub.1015822870
    143 rdf:type schema:PropertyValue
    144 Nf183cc7b4333494a9e9df1a3b4af8d65 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    145 schema:name Metagenomics
    146 rdf:type schema:DefinedTerm
    147 Nfb9c1eaebe5f42efa2c81860208cafad rdf:first sg:person.0627152136.41
    148 rdf:rest Nb06f382da5884141a58e19d38b667d04
    149 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    150 schema:name Mathematical Sciences
    151 rdf:type schema:DefinedTerm
    152 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
    153 schema:name Statistics
    154 rdf:type schema:DefinedTerm
    155 sg:grant.2519905 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-11-544
    156 rdf:type schema:MonetaryGrant
    157 sg:grant.2529425 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-11-544
    158 rdf:type schema:MonetaryGrant
    159 sg:grant.2545461 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-11-544
    160 rdf:type schema:MonetaryGrant
    161 sg:journal.1023786 schema:issn 1471-2105
    162 schema:name BMC Bioinformatics
    163 schema:publisher Springer Nature
    164 rdf:type schema:Periodical
    165 sg:person.01223441713.02 schema:affiliation grid-institutes:grid.164295.d
    166 schema:familyName Salzberg
    167 schema:givenName Steven L
    168 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02
    169 rdf:type schema:Person
    170 sg:person.0627152136.41 schema:affiliation grid-institutes:grid.164295.d
    171 schema:familyName Kelley
    172 schema:givenName David R
    173 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0627152136.41
    174 rdf:type schema:Person
    175 sg:pub.10.1007/978-3-540-78839-3_3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004083256
    176 https://doi.org/10.1007/978-3-540-78839-3_3
    177 rdf:type schema:CreativeWork
    178 sg:pub.10.1007/978-3-642-02008-7_29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005288938
    179 https://doi.org/10.1007/978-3-642-02008-7_29
    180 rdf:type schema:CreativeWork
    181 sg:pub.10.1007/978-3-642-12683-3_35 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047449367
    182 https://doi.org/10.1007/978-3-642-12683-3_35
    183 rdf:type schema:CreativeWork
    184 sg:pub.10.1007/bf01908075 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022323983
    185 https://doi.org/10.1007/bf01908075
    186 rdf:type schema:CreativeWork
    187 sg:pub.10.1007/pl00006158 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017555006
    188 https://doi.org/10.1007/pl00006158
    189 rdf:type schema:CreativeWork
    190 sg:pub.10.1007/s002390010184 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045143895
    191 https://doi.org/10.1007/s002390010184
    192 rdf:type schema:CreativeWork
    193 sg:pub.10.1038/ismej.2010.1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003110594
    194 https://doi.org/10.1038/ismej.2010.1
    195 rdf:type schema:CreativeWork
    196 sg:pub.10.1038/nature02340 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023089166
    197 https://doi.org/10.1038/nature02340
    198 rdf:type schema:CreativeWork
    199 sg:pub.10.1038/nature08586 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017136943
    200 https://doi.org/10.1038/nature08586
    201 rdf:type schema:CreativeWork
    202 sg:pub.10.1038/nature08656 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013886837
    203 https://doi.org/10.1038/nature08656
    204 rdf:type schema:CreativeWork
    205 sg:pub.10.1038/nature08821 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050498034
    206 https://doi.org/10.1038/nature08821
    207 rdf:type schema:CreativeWork
    208 sg:pub.10.1038/nmeth.1358 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008886215
    209 https://doi.org/10.1038/nmeth.1358
    210 rdf:type schema:CreativeWork
    211 sg:pub.10.1038/nmeth1043 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047202519
    212 https://doi.org/10.1038/nmeth1043
    213 rdf:type schema:CreativeWork
    214 sg:pub.10.1038/nmeth976 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007149601
    215 https://doi.org/10.1038/nmeth976
    216 rdf:type schema:CreativeWork
    217 sg:pub.10.1186/1471-2105-10-316 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025025744
    218 https://doi.org/10.1186/1471-2105-10-316
    219 rdf:type schema:CreativeWork
    220 sg:pub.10.1186/1471-2105-10-430 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037735919
    221 https://doi.org/10.1186/1471-2105-10-430
    222 rdf:type schema:CreativeWork
    223 sg:pub.10.1186/1471-2105-10-56 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029876223
    224 https://doi.org/10.1186/1471-2105-10-56
    225 rdf:type schema:CreativeWork
    226 sg:pub.10.1186/1471-2105-11-152 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027606868
    227 https://doi.org/10.1186/1471-2105-11-152
    228 rdf:type schema:CreativeWork
    229 sg:pub.10.1186/1471-2164-10-487 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025594094
    230 https://doi.org/10.1186/1471-2164-10-487
    231 rdf:type schema:CreativeWork
    232 sg:pub.10.1186/1471-2164-9-104 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018202992
    233 https://doi.org/10.1186/1471-2164-9-104
    234 rdf:type schema:CreativeWork
    235 sg:pub.10.1186/gb-2008-9-10-r151 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023248704
    236 https://doi.org/10.1186/gb-2008-9-10-r151
    237 rdf:type schema:CreativeWork
    238 sg:pub.10.1186/gb-2009-10-8-r85 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014147708
    239 https://doi.org/10.1186/gb-2009-10-8-r85
    240 rdf:type schema:CreativeWork
    241 sg:pub.10.2165/00822942-200403040-00003 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004526278
    242 https://doi.org/10.2165/00822942-200403040-00003
    243 rdf:type schema:CreativeWork
    244 grid-institutes:grid.164295.d schema:alternateName Department of Computer Science, University of Maryland, A.V. Williams Building College Park, 20742, MD, USA
    245 schema:name Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, 20742, College Park, MD, USA
    246 Department of Computer Science, University of Maryland, A.V. Williams Building College Park, 20742, MD, USA
    247 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...