Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome ... View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2016-11-25

AUTHORS

Jonathan Thorsen, Asker Brejnrod, Martin Mortensen, Morten A. Rasmussen, Jakob Stokholm, Waleed Abu Al-Soud, Søren Sørensen, Hans Bisgaard, Johannes Waage

ABSTRACT

BackgroundThere is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources.ResultsRunning more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power.ConclusionsOur results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets. More... »

PAGES

62

References to SciGraph publications

  • 2010-10-27. Differential expression analysis for sequence count data in GENOME BIOLOGY
  • 2002. Modern Applied Statistics with S in NONE
  • 2012-06-13. Structure, function and diversity of the healthy human microbiome in NATURE
  • 2014-05-05. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis in MICROBIOME
  • 2014-04-07. CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction in MICROBIOME
  • 2014-12-05. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 in GENOME BIOLOGY
  • 2010-03. A human gut microbial gene catalogue established by metagenomic sequencing in NATURE
  • 2011-03-17. pROC: an open-source package for R and S+ to analyze and compare ROC curves in BMC BIOINFORMATICS
  • 2009. ggplot2, Elegant Graphics for Data Analysis in NONE
  • 2014-06-27. Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition in GENOME BIOLOGY
  • 2010-08-10. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data in BMC BIOINFORMATICS
  • 2011-04-20. Enterotypes of the human gut microbiome in NATURE
  • 2013-09-29. Differential abundance analysis for microbial marker-gene surveys in NATURE METHODS
  • 2014-03-28. Reply to: "A fair comparison" in NATURE METHODS
  • 2009. Mixed effects models and extensions in ecology with R in NONE
  • 2012-07-12. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome in GIGASCIENCE
  • 2014-03-28. A fair comparison in NATURE METHODS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8

    DOI

    http://dx.doi.org/10.1186/s40168-016-0208-8

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1019428991

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/27884206


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Bacteria", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Base Sequence", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Benchmarking", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Case-Control Studies", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computational Biology", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "False Positive Reactions", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "High-Throughput Nucleotide Sequencing", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Humans", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Microbiota", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA, Ribosomal, 16S", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, RNA", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Thorsen", 
            "givenName": "Jonathan", 
            "id": "sg:person.014631160175.76", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014631160175.76"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Biology, Laboratory of Genomics and Molecular Biomedicine, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark", 
                "Department of Biology, Laboratory of Genomics and Molecular Biomedicine, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Brejnrod", 
            "givenName": "Asker", 
            "id": "sg:person.0702056330.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0702056330.02"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Mortensen", 
            "givenName": "Martin", 
            "id": "sg:person.013630367571.45", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013630367571.45"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Rasmussen", 
            "givenName": "Morten A.", 
            "id": "sg:person.01013242545.03", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01013242545.03"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Stokholm", 
            "givenName": "Jakob", 
            "id": "sg:person.0733166761.32", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0733166761.32"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Al-Soud", 
            "givenName": "Waleed Abu", 
            "id": "sg:person.0610362715.16", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0610362715.16"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "S\u00f8rensen", 
            "givenName": "S\u00f8ren", 
            "id": "sg:person.0770304772.43", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770304772.43"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Bisgaard", 
            "givenName": "Hans", 
            "id": "sg:person.0713540153.74", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0713540153.74"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Waage", 
            "givenName": "Johannes", 
            "id": "sg:person.0701562305.05", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0701562305.05"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-0-387-21706-2", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035613449", 
              "https://doi.org/10.1007/978-0-387-21706-2"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/2049-2618-2-15", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046874717", 
              "https://doi.org/10.1186/2049-2618-2-15"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-87458-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023139038", 
              "https://doi.org/10.1007/978-0-387-87458-6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-12-77", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1014582441", 
              "https://doi.org/10.1186/1471-2105-12-77"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08821", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050498034", 
              "https://doi.org/10.1038/nature08821"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2658", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002139060", 
              "https://doi.org/10.1038/nmeth.2658"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/2049-2618-2-11", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031538238", 
              "https://doi.org/10.1186/2049-2618-2-11"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-11-422", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047456674", 
              "https://doi.org/10.1186/1471-2105-11-422"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature11234", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007740093", 
              "https://doi.org/10.1038/nature11234"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2897", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030185276", 
              "https://doi.org/10.1038/nmeth.2897"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-10-r106", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031289083", 
              "https://doi.org/10.1186/gb-2010-11-10-r106"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2014-15-6-r76", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1024439294", 
              "https://doi.org/10.1186/gb-2014-15-6-r76"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2898", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007964999", 
              "https://doi.org/10.1038/nmeth.2898"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13059-014-0550-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015222646", 
              "https://doi.org/10.1186/s13059-014-0550-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/2047-217x-1-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050567563", 
              "https://doi.org/10.1186/2047-217x-1-7"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-98141-3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1041188628", 
              "https://doi.org/10.1007/978-0-387-98141-3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature09944", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1026204536", 
              "https://doi.org/10.1038/nature09944"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2016-11-25", 
        "datePublishedReg": "2016-11-25", 
        "description": "BackgroundThere is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources.ResultsRunning more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power.ConclusionsOur results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/s40168-016-0208-8", 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1048878", 
            "issn": [
              "2049-2618"
            ], 
            "name": "Microbiome", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "4"
          }
        ], 
        "keywords": [
          "classical statistical approach", 
          "unique data characteristics", 
          "unbalanced study design", 
          "better detection power", 
          "microbiome datasets", 
          "range of parameters", 
          "theoretical distributions", 
          "statistical approach", 
          "control assignment", 
          "new method", 
          "false discoveries", 
          "control proportions", 
          "data analysis methods", 
          "real datasets", 
          "sparsity", 
          "differential relative abundance", 
          "detection power", 
          "abundance analysis", 
          "distance metric", 
          "analysis method", 
          "method performance", 
          "common approach", 
          "available methods", 
          "choice of method", 
          "large datasets", 
          "immense scientific interest", 
          "data characteristics", 
          "inference", 
          "depth variation", 
          "control balance", 
          "transformation sensitivity", 
          "robustness", 
          "false positive rate", 
          "framework", 
          "power", 
          "approach", 
          "large-scale benchmarking", 
          "microbiome studies", 
          "field", 
          "parameters", 
          "dataset", 
          "rRNA gene hypervariable regions", 
          "scientific interest", 
          "sample conditions", 
          "metrics", 
          "benchmarking", 
          "extensible framework", 
          "distribution", 
          "low false positive rate", 
          "terms", 
          "output", 
          "benchmarking framework", 
          "relative abundance analysis", 
          "assignment", 
          "analysis outcomes", 
          "analysis", 
          "performance", 
          "design", 
          "tool", 
          "retrieval power", 
          "different sources", 
          "conditions", 
          "results", 
          "choice", 
          "large differences", 
          "high false positive rate", 
          "interest", 
          "range", 
          "size", 
          "bias", 
          "variation", 
          "positive rate", 
          "effect", 
          "separation", 
          "normalization", 
          "region", 
          "characteristics", 
          "source", 
          "nonconformity", 
          "effect size", 
          "rate", 
          "balance", 
          "sensitivity", 
          "sample separation", 
          "different sequencing platforms", 
          "efforts", 
          "plethora", 
          "units", 
          "strategies", 
          "separation power", 
          "discovery", 
          "important factor", 
          "platform", 
          "human microbiome", 
          "test", 
          "retrieval rate", 
          "study", 
          "human physiology", 
          "little effect", 
          "borrowing", 
          "factors", 
          "differences", 
          "count", 
          "size normalization", 
          "abundance", 
          "operational taxonomic units", 
          "caution", 
          "study design", 
          "sequencing platforms", 
          "recommendations", 
          "high-throughput sequencing", 
          "community", 
          "relative abundance", 
          "taxonomic units", 
          "proportion", 
          "outcomes", 
          "physiology", 
          "sequencing", 
          "RNA-seq analysis", 
          "microbiome", 
          "health", 
          "method", 
          "ConclusionsOur results", 
          "disease", 
          "hypervariable region", 
          "BackgroundThere", 
          "bacterial communities", 
          "amplicons"
        ], 
        "name": "Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies", 
        "pagination": "62", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1019428991"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s40168-016-0208-8"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "27884206"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s40168-016-0208-8", 
          "https://app.dimensions.ai/details/publication/pub.1019428991"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-08-04T17:03", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220804/entities/gbq_results/article/article_699.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/s40168-016-0208-8"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8'


     

    This table displays all metadata directly associated to this object as RDF triples.

    360 TRIPLES      21 PREDICATES      181 URIs      156 LITERALS      18 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s40168-016-0208-8 schema:about N153143f7a1a94cc8a207fc127e272303
    2 N17c9c55bda4b4308ad831a45fd7f7944
    3 N31345a2c787349a981d0865660a40cf6
    4 N54c0995a3d37472cbd2c11a67e696939
    5 N6524127ad9ab42cd883e38af37d1a3ad
    6 N7ff07a75422945cc91a7fce3a96298e0
    7 Ncde14494795c4bb980fbec832c8f2155
    8 Ncea534f1ca524807914d0ea243188d80
    9 Nd1e7ae5547694ae293ceeeac573a03bf
    10 Nd99a2601f15b4e43a6d49b727afacfa8
    11 Nf82e04dceada40b9ac529bc9cea14dc1
    12 anzsrc-for:06
    13 anzsrc-for:0604
    14 schema:author N35c68743dbaf486cb2c9fe0f3a811e5d
    15 schema:citation sg:pub.10.1007/978-0-387-21706-2
    16 sg:pub.10.1007/978-0-387-87458-6
    17 sg:pub.10.1007/978-0-387-98141-3
    18 sg:pub.10.1038/nature08821
    19 sg:pub.10.1038/nature09944
    20 sg:pub.10.1038/nature11234
    21 sg:pub.10.1038/nmeth.2658
    22 sg:pub.10.1038/nmeth.2897
    23 sg:pub.10.1038/nmeth.2898
    24 sg:pub.10.1186/1471-2105-11-422
    25 sg:pub.10.1186/1471-2105-12-77
    26 sg:pub.10.1186/2047-217x-1-7
    27 sg:pub.10.1186/2049-2618-2-11
    28 sg:pub.10.1186/2049-2618-2-15
    29 sg:pub.10.1186/gb-2010-11-10-r106
    30 sg:pub.10.1186/gb-2014-15-6-r76
    31 sg:pub.10.1186/s13059-014-0550-8
    32 schema:datePublished 2016-11-25
    33 schema:datePublishedReg 2016-11-25
    34 schema:description BackgroundThere is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources.ResultsRunning more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power.ConclusionsOur results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.
    35 schema:genre article
    36 schema:isAccessibleForFree true
    37 schema:isPartOf N27804a1db4f64bde8038fef8bfed6554
    38 Ne668ec40c4c549eb973220f1e34fdf21
    39 sg:journal.1048878
    40 schema:keywords BackgroundThere
    41 ConclusionsOur results
    42 RNA-seq analysis
    43 abundance
    44 abundance analysis
    45 amplicons
    46 analysis
    47 analysis method
    48 analysis outcomes
    49 approach
    50 assignment
    51 available methods
    52 bacterial communities
    53 balance
    54 benchmarking
    55 benchmarking framework
    56 better detection power
    57 bias
    58 borrowing
    59 caution
    60 characteristics
    61 choice
    62 choice of method
    63 classical statistical approach
    64 common approach
    65 community
    66 conditions
    67 control assignment
    68 control balance
    69 control proportions
    70 count
    71 data analysis methods
    72 data characteristics
    73 dataset
    74 depth variation
    75 design
    76 detection power
    77 differences
    78 different sequencing platforms
    79 different sources
    80 differential relative abundance
    81 discovery
    82 disease
    83 distance metric
    84 distribution
    85 effect
    86 effect size
    87 efforts
    88 extensible framework
    89 factors
    90 false discoveries
    91 false positive rate
    92 field
    93 framework
    94 health
    95 high false positive rate
    96 high-throughput sequencing
    97 human microbiome
    98 human physiology
    99 hypervariable region
    100 immense scientific interest
    101 important factor
    102 inference
    103 interest
    104 large datasets
    105 large differences
    106 large-scale benchmarking
    107 little effect
    108 low false positive rate
    109 method
    110 method performance
    111 metrics
    112 microbiome
    113 microbiome datasets
    114 microbiome studies
    115 new method
    116 nonconformity
    117 normalization
    118 operational taxonomic units
    119 outcomes
    120 output
    121 parameters
    122 performance
    123 physiology
    124 platform
    125 plethora
    126 positive rate
    127 power
    128 proportion
    129 rRNA gene hypervariable regions
    130 range
    131 range of parameters
    132 rate
    133 real datasets
    134 recommendations
    135 region
    136 relative abundance
    137 relative abundance analysis
    138 results
    139 retrieval power
    140 retrieval rate
    141 robustness
    142 sample conditions
    143 sample separation
    144 scientific interest
    145 sensitivity
    146 separation
    147 separation power
    148 sequencing
    149 sequencing platforms
    150 size
    151 size normalization
    152 source
    153 sparsity
    154 statistical approach
    155 strategies
    156 study
    157 study design
    158 taxonomic units
    159 terms
    160 test
    161 theoretical distributions
    162 tool
    163 transformation sensitivity
    164 unbalanced study design
    165 unique data characteristics
    166 units
    167 variation
    168 schema:name Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies
    169 schema:pagination 62
    170 schema:productId N21cc0a3586a44a419c06b74b4c9a06f2
    171 N5dab1f9ece78494f9aaab4e9d21a02d4
    172 Nd86c01037af64051b4c0ae273467969c
    173 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019428991
    174 https://doi.org/10.1186/s40168-016-0208-8
    175 schema:sdDatePublished 2022-08-04T17:03
    176 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    177 schema:sdPublisher Nd009bfa4abcd4e069b442bb7c668ace5
    178 schema:url https://doi.org/10.1186/s40168-016-0208-8
    179 sgo:license sg:explorer/license/
    180 sgo:sdDataset articles
    181 rdf:type schema:ScholarlyArticle
    182 N153143f7a1a94cc8a207fc127e272303 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    183 schema:name Benchmarking
    184 rdf:type schema:DefinedTerm
    185 N17c9c55bda4b4308ad831a45fd7f7944 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    186 schema:name Sequence Analysis, RNA
    187 rdf:type schema:DefinedTerm
    188 N21cc0a3586a44a419c06b74b4c9a06f2 schema:name pubmed_id
    189 schema:value 27884206
    190 rdf:type schema:PropertyValue
    191 N27804a1db4f64bde8038fef8bfed6554 schema:issueNumber 1
    192 rdf:type schema:PublicationIssue
    193 N31345a2c787349a981d0865660a40cf6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    194 schema:name RNA, Ribosomal, 16S
    195 rdf:type schema:DefinedTerm
    196 N35c68743dbaf486cb2c9fe0f3a811e5d rdf:first sg:person.014631160175.76
    197 rdf:rest Ndd9cdf1ece3c4026be7dcd658ab9d87b
    198 N469fc03c592b4308a5c001c1fc208cef rdf:first sg:person.01013242545.03
    199 rdf:rest Ndfdc2f7770c2467f93691ac656a67fff
    200 N54c0995a3d37472cbd2c11a67e696939 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    201 schema:name Humans
    202 rdf:type schema:DefinedTerm
    203 N565943a9e1f8492e8a5d69efd1044254 rdf:first sg:person.0770304772.43
    204 rdf:rest N9613b7d554884790aa680260f7527f6f
    205 N5dab1f9ece78494f9aaab4e9d21a02d4 schema:name doi
    206 schema:value 10.1186/s40168-016-0208-8
    207 rdf:type schema:PropertyValue
    208 N6524127ad9ab42cd883e38af37d1a3ad schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    209 schema:name Computational Biology
    210 rdf:type schema:DefinedTerm
    211 N7ff07a75422945cc91a7fce3a96298e0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    212 schema:name High-Throughput Nucleotide Sequencing
    213 rdf:type schema:DefinedTerm
    214 N822cf9d6e6e04e8ea9e3b197bed36934 rdf:first sg:person.013630367571.45
    215 rdf:rest N469fc03c592b4308a5c001c1fc208cef
    216 N9613b7d554884790aa680260f7527f6f rdf:first sg:person.0713540153.74
    217 rdf:rest Nc68973e5225040788939e8e007c98e2c
    218 Na7502f5bdba24b0f80132a6137c6af81 rdf:first sg:person.0610362715.16
    219 rdf:rest N565943a9e1f8492e8a5d69efd1044254
    220 Nc68973e5225040788939e8e007c98e2c rdf:first sg:person.0701562305.05
    221 rdf:rest rdf:nil
    222 Ncde14494795c4bb980fbec832c8f2155 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    223 schema:name Case-Control Studies
    224 rdf:type schema:DefinedTerm
    225 Ncea534f1ca524807914d0ea243188d80 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    226 schema:name Microbiota
    227 rdf:type schema:DefinedTerm
    228 Nd009bfa4abcd4e069b442bb7c668ace5 schema:name Springer Nature - SN SciGraph project
    229 rdf:type schema:Organization
    230 Nd1e7ae5547694ae293ceeeac573a03bf schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    231 schema:name Base Sequence
    232 rdf:type schema:DefinedTerm
    233 Nd86c01037af64051b4c0ae273467969c schema:name dimensions_id
    234 schema:value pub.1019428991
    235 rdf:type schema:PropertyValue
    236 Nd99a2601f15b4e43a6d49b727afacfa8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    237 schema:name False Positive Reactions
    238 rdf:type schema:DefinedTerm
    239 Ndd9cdf1ece3c4026be7dcd658ab9d87b rdf:first sg:person.0702056330.02
    240 rdf:rest N822cf9d6e6e04e8ea9e3b197bed36934
    241 Ndfdc2f7770c2467f93691ac656a67fff rdf:first sg:person.0733166761.32
    242 rdf:rest Na7502f5bdba24b0f80132a6137c6af81
    243 Ne668ec40c4c549eb973220f1e34fdf21 schema:volumeNumber 4
    244 rdf:type schema:PublicationVolume
    245 Nf82e04dceada40b9ac529bc9cea14dc1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    246 schema:name Bacteria
    247 rdf:type schema:DefinedTerm
    248 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    249 schema:name Biological Sciences
    250 rdf:type schema:DefinedTerm
    251 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    252 schema:name Genetics
    253 rdf:type schema:DefinedTerm
    254 sg:journal.1048878 schema:issn 2049-2618
    255 schema:name Microbiome
    256 schema:publisher Springer Nature
    257 rdf:type schema:Periodical
    258 sg:person.01013242545.03 schema:affiliation grid-institutes:grid.5254.6
    259 schema:familyName Rasmussen
    260 schema:givenName Morten A.
    261 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01013242545.03
    262 rdf:type schema:Person
    263 sg:person.013630367571.45 schema:affiliation grid-institutes:grid.5254.6
    264 schema:familyName Mortensen
    265 schema:givenName Martin
    266 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013630367571.45
    267 rdf:type schema:Person
    268 sg:person.014631160175.76 schema:affiliation grid-institutes:grid.5254.6
    269 schema:familyName Thorsen
    270 schema:givenName Jonathan
    271 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014631160175.76
    272 rdf:type schema:Person
    273 sg:person.0610362715.16 schema:affiliation grid-institutes:grid.5254.6
    274 schema:familyName Al-Soud
    275 schema:givenName Waleed Abu
    276 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0610362715.16
    277 rdf:type schema:Person
    278 sg:person.0701562305.05 schema:affiliation grid-institutes:grid.5254.6
    279 schema:familyName Waage
    280 schema:givenName Johannes
    281 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0701562305.05
    282 rdf:type schema:Person
    283 sg:person.0702056330.02 schema:affiliation grid-institutes:grid.5254.6
    284 schema:familyName Brejnrod
    285 schema:givenName Asker
    286 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0702056330.02
    287 rdf:type schema:Person
    288 sg:person.0713540153.74 schema:affiliation grid-institutes:grid.5254.6
    289 schema:familyName Bisgaard
    290 schema:givenName Hans
    291 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0713540153.74
    292 rdf:type schema:Person
    293 sg:person.0733166761.32 schema:affiliation grid-institutes:grid.5254.6
    294 schema:familyName Stokholm
    295 schema:givenName Jakob
    296 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0733166761.32
    297 rdf:type schema:Person
    298 sg:person.0770304772.43 schema:affiliation grid-institutes:grid.5254.6
    299 schema:familyName Sørensen
    300 schema:givenName Søren
    301 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770304772.43
    302 rdf:type schema:Person
    303 sg:pub.10.1007/978-0-387-21706-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035613449
    304 https://doi.org/10.1007/978-0-387-21706-2
    305 rdf:type schema:CreativeWork
    306 sg:pub.10.1007/978-0-387-87458-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023139038
    307 https://doi.org/10.1007/978-0-387-87458-6
    308 rdf:type schema:CreativeWork
    309 sg:pub.10.1007/978-0-387-98141-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041188628
    310 https://doi.org/10.1007/978-0-387-98141-3
    311 rdf:type schema:CreativeWork
    312 sg:pub.10.1038/nature08821 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050498034
    313 https://doi.org/10.1038/nature08821
    314 rdf:type schema:CreativeWork
    315 sg:pub.10.1038/nature09944 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026204536
    316 https://doi.org/10.1038/nature09944
    317 rdf:type schema:CreativeWork
    318 sg:pub.10.1038/nature11234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007740093
    319 https://doi.org/10.1038/nature11234
    320 rdf:type schema:CreativeWork
    321 sg:pub.10.1038/nmeth.2658 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002139060
    322 https://doi.org/10.1038/nmeth.2658
    323 rdf:type schema:CreativeWork
    324 sg:pub.10.1038/nmeth.2897 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030185276
    325 https://doi.org/10.1038/nmeth.2897
    326 rdf:type schema:CreativeWork
    327 sg:pub.10.1038/nmeth.2898 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007964999
    328 https://doi.org/10.1038/nmeth.2898
    329 rdf:type schema:CreativeWork
    330 sg:pub.10.1186/1471-2105-11-422 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047456674
    331 https://doi.org/10.1186/1471-2105-11-422
    332 rdf:type schema:CreativeWork
    333 sg:pub.10.1186/1471-2105-12-77 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014582441
    334 https://doi.org/10.1186/1471-2105-12-77
    335 rdf:type schema:CreativeWork
    336 sg:pub.10.1186/2047-217x-1-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050567563
    337 https://doi.org/10.1186/2047-217x-1-7
    338 rdf:type schema:CreativeWork
    339 sg:pub.10.1186/2049-2618-2-11 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031538238
    340 https://doi.org/10.1186/2049-2618-2-11
    341 rdf:type schema:CreativeWork
    342 sg:pub.10.1186/2049-2618-2-15 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046874717
    343 https://doi.org/10.1186/2049-2618-2-15
    344 rdf:type schema:CreativeWork
    345 sg:pub.10.1186/gb-2010-11-10-r106 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031289083
    346 https://doi.org/10.1186/gb-2010-11-10-r106
    347 rdf:type schema:CreativeWork
    348 sg:pub.10.1186/gb-2014-15-6-r76 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024439294
    349 https://doi.org/10.1186/gb-2014-15-6-r76
    350 rdf:type schema:CreativeWork
    351 sg:pub.10.1186/s13059-014-0550-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015222646
    352 https://doi.org/10.1186/s13059-014-0550-8
    353 rdf:type schema:CreativeWork
    354 grid-institutes:grid.5254.6 schema:alternateName COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
    355 Department of Biology, Laboratory of Genomics and Molecular Biomedicine, University of Copenhagen, Copenhagen, Denmark
    356 Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
    357 schema:name COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
    358 Department of Biology, Laboratory of Genomics and Molecular Biomedicine, University of Copenhagen, Copenhagen, Denmark
    359 Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
    360 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...