Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome ... View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2016-11-25

AUTHORS

Jonathan Thorsen, Asker Brejnrod, Martin Mortensen, Morten A. Rasmussen, Jakob Stokholm, Waleed Abu Al-Soud, Søren Sørensen, Hans Bisgaard, Johannes Waage

ABSTRACT

BackgroundThere is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources.ResultsRunning more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power.ConclusionsOur results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets. More... »

PAGES

62

References to SciGraph publications

  • 2010-10-27. Differential expression analysis for sequence count data in GENOME BIOLOGY
  • 2002. Modern Applied Statistics with S in NONE
  • 2012-06-13. Structure, function and diversity of the healthy human microbiome in NATURE
  • 2014-05-05. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis in MICROBIOME
  • 2014-04-07. CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction in MICROBIOME
  • 2014-12-05. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 in GENOME BIOLOGY
  • 2010-03. A human gut microbial gene catalogue established by metagenomic sequencing in NATURE
  • 2011-03-17. pROC: an open-source package for R and S+ to analyze and compare ROC curves in BMC BIOINFORMATICS
  • 2009. ggplot2, Elegant Graphics for Data Analysis in NONE
  • 2014-06-27. Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition in GENOME BIOLOGY
  • 2010-08-10. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data in BMC BIOINFORMATICS
  • 2011-04-20. Enterotypes of the human gut microbiome in NATURE
  • 2013-09-29. Differential abundance analysis for microbial marker-gene surveys in NATURE METHODS
  • 2014-03-28. Reply to: "A fair comparison" in NATURE METHODS
  • 2009. Mixed effects models and extensions in ecology with R in NONE
  • 2012-07-12. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome in GIGASCIENCE
  • 2014-03-28. A fair comparison in NATURE METHODS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8

    DOI

    http://dx.doi.org/10.1186/s40168-016-0208-8

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1019428991

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/27884206


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Bacteria", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Base Sequence", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Benchmarking", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Case-Control Studies", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computational Biology", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "False Positive Reactions", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "High-Throughput Nucleotide Sequencing", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Humans", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Microbiota", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA, Ribosomal, 16S", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, RNA", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Thorsen", 
            "givenName": "Jonathan", 
            "id": "sg:person.014631160175.76", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014631160175.76"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Biology, Laboratory of Genomics and Molecular Biomedicine, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark", 
                "Department of Biology, Laboratory of Genomics and Molecular Biomedicine, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Brejnrod", 
            "givenName": "Asker", 
            "id": "sg:person.0702056330.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0702056330.02"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Mortensen", 
            "givenName": "Martin", 
            "id": "sg:person.013630367571.45", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013630367571.45"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Rasmussen", 
            "givenName": "Morten A.", 
            "id": "sg:person.01013242545.03", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01013242545.03"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Stokholm", 
            "givenName": "Jakob", 
            "id": "sg:person.0733166761.32", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0733166761.32"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Al-Soud", 
            "givenName": "Waleed Abu", 
            "id": "sg:person.0610362715.16", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0610362715.16"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "S\u00f8rensen", 
            "givenName": "S\u00f8ren", 
            "id": "sg:person.0770304772.43", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770304772.43"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Bisgaard", 
            "givenName": "Hans", 
            "id": "sg:person.0713540153.74", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0713540153.74"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark", 
              "id": "http://www.grid.ac/institutes/grid.5254.6", 
              "name": [
                "COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Waage", 
            "givenName": "Johannes", 
            "id": "sg:person.0701562305.05", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0701562305.05"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-0-387-21706-2", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035613449", 
              "https://doi.org/10.1007/978-0-387-21706-2"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/2049-2618-2-15", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046874717", 
              "https://doi.org/10.1186/2049-2618-2-15"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-87458-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023139038", 
              "https://doi.org/10.1007/978-0-387-87458-6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-12-77", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1014582441", 
              "https://doi.org/10.1186/1471-2105-12-77"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08821", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050498034", 
              "https://doi.org/10.1038/nature08821"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2658", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002139060", 
              "https://doi.org/10.1038/nmeth.2658"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/2049-2618-2-11", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031538238", 
              "https://doi.org/10.1186/2049-2618-2-11"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-11-422", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047456674", 
              "https://doi.org/10.1186/1471-2105-11-422"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature11234", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007740093", 
              "https://doi.org/10.1038/nature11234"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2897", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030185276", 
              "https://doi.org/10.1038/nmeth.2897"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-10-r106", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031289083", 
              "https://doi.org/10.1186/gb-2010-11-10-r106"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2014-15-6-r76", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1024439294", 
              "https://doi.org/10.1186/gb-2014-15-6-r76"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2898", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007964999", 
              "https://doi.org/10.1038/nmeth.2898"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13059-014-0550-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015222646", 
              "https://doi.org/10.1186/s13059-014-0550-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/2047-217x-1-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050567563", 
              "https://doi.org/10.1186/2047-217x-1-7"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-98141-3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1041188628", 
              "https://doi.org/10.1007/978-0-387-98141-3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature09944", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1026204536", 
              "https://doi.org/10.1038/nature09944"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2016-11-25", 
        "datePublishedReg": "2016-11-25", 
        "description": "BackgroundThere is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources.ResultsRunning more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power.ConclusionsOur results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/s40168-016-0208-8", 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1048878", 
            "issn": [
              "2049-2618"
            ], 
            "name": "Microbiome", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "4"
          }
        ], 
        "keywords": [
          "classical statistical approach", 
          "unique data characteristics", 
          "unbalanced study design", 
          "better detection power", 
          "microbiome datasets", 
          "range of parameters", 
          "theoretical distributions", 
          "statistical approach", 
          "control assignment", 
          "new method", 
          "false discoveries", 
          "control proportions", 
          "data analysis methods", 
          "real datasets", 
          "sparsity", 
          "differential relative abundance", 
          "detection power", 
          "abundance analysis", 
          "distance metric", 
          "analysis method", 
          "method performance", 
          "common approach", 
          "available methods", 
          "choice of method", 
          "large datasets", 
          "immense scientific interest", 
          "data characteristics", 
          "inference", 
          "depth variation", 
          "control balance", 
          "transformation sensitivity", 
          "robustness", 
          "false positive rate", 
          "framework", 
          "power", 
          "approach", 
          "large-scale benchmarking", 
          "microbiome studies", 
          "field", 
          "parameters", 
          "dataset", 
          "rRNA gene hypervariable regions", 
          "scientific interest", 
          "sample conditions", 
          "metrics", 
          "benchmarking", 
          "extensible framework", 
          "distribution", 
          "low false positive rate", 
          "terms", 
          "output", 
          "benchmarking framework", 
          "relative abundance analysis", 
          "assignment", 
          "analysis outcomes", 
          "analysis", 
          "performance", 
          "design", 
          "tool", 
          "retrieval power", 
          "different sources", 
          "conditions", 
          "results", 
          "choice", 
          "large differences", 
          "high false positive rate", 
          "interest", 
          "range", 
          "size", 
          "bias", 
          "variation", 
          "positive rate", 
          "effect", 
          "separation", 
          "normalization", 
          "region", 
          "characteristics", 
          "source", 
          "nonconformity", 
          "effect size", 
          "rate", 
          "balance", 
          "sensitivity", 
          "sample separation", 
          "different sequencing platforms", 
          "efforts", 
          "plethora", 
          "units", 
          "strategies", 
          "separation power", 
          "discovery", 
          "important factor", 
          "platform", 
          "human microbiome", 
          "test", 
          "retrieval rate", 
          "study", 
          "human physiology", 
          "little effect", 
          "borrowing", 
          "factors", 
          "differences", 
          "count", 
          "size normalization", 
          "abundance", 
          "operational taxonomic units", 
          "caution", 
          "study design", 
          "sequencing platforms", 
          "recommendations", 
          "high-throughput sequencing", 
          "community", 
          "relative abundance", 
          "taxonomic units", 
          "proportion", 
          "outcomes", 
          "physiology", 
          "sequencing", 
          "RNA-seq analysis", 
          "microbiome", 
          "health", 
          "method", 
          "ConclusionsOur results", 
          "disease", 
          "hypervariable region", 
          "BackgroundThere", 
          "bacterial communities", 
          "amplicons"
        ], 
        "name": "Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies", 
        "pagination": "62", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1019428991"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s40168-016-0208-8"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "27884206"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s40168-016-0208-8", 
          "https://app.dimensions.ai/details/publication/pub.1019428991"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-08-04T17:03", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220804/entities/gbq_results/article/article_699.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/s40168-016-0208-8"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s40168-016-0208-8'


     

    This table displays all metadata directly associated to this object as RDF triples.

    360 TRIPLES      21 PREDICATES      181 URIs      156 LITERALS      18 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s40168-016-0208-8 schema:about N1f32bd96bc774a28b8d661ceea92ef0b
    2 N3d1e10fb9d5c467cad3fb7b345723355
    3 N45323f23801047a38608c94f34927052
    4 N55c5d0e246974d5da6a48a5af0bb1026
    5 N586cb6b3f8644e9db75d4a8b3c2b72e8
    6 N634642c4b8274501a6d99ed0c34f9671
    7 N9b1609dffa2f49b5b4eb2edc3ffba270
    8 Nd44a74aa8364489a8561d17b744cb9f1
    9 Ne13f292a75a146aeb12f32aec9395708
    10 Ne633e59448004240807e6aff51917095
    11 Nf81ac9f8d23f4e8fbb60ab6bff273c37
    12 anzsrc-for:06
    13 anzsrc-for:0604
    14 schema:author N78929cf2ef7c4de3ba2d920178d4ca92
    15 schema:citation sg:pub.10.1007/978-0-387-21706-2
    16 sg:pub.10.1007/978-0-387-87458-6
    17 sg:pub.10.1007/978-0-387-98141-3
    18 sg:pub.10.1038/nature08821
    19 sg:pub.10.1038/nature09944
    20 sg:pub.10.1038/nature11234
    21 sg:pub.10.1038/nmeth.2658
    22 sg:pub.10.1038/nmeth.2897
    23 sg:pub.10.1038/nmeth.2898
    24 sg:pub.10.1186/1471-2105-11-422
    25 sg:pub.10.1186/1471-2105-12-77
    26 sg:pub.10.1186/2047-217x-1-7
    27 sg:pub.10.1186/2049-2618-2-11
    28 sg:pub.10.1186/2049-2618-2-15
    29 sg:pub.10.1186/gb-2010-11-10-r106
    30 sg:pub.10.1186/gb-2014-15-6-r76
    31 sg:pub.10.1186/s13059-014-0550-8
    32 schema:datePublished 2016-11-25
    33 schema:datePublishedReg 2016-11-25
    34 schema:description BackgroundThere is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources.ResultsRunning more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power.ConclusionsOur results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.
    35 schema:genre article
    36 schema:isAccessibleForFree true
    37 schema:isPartOf N9d67f19e6d214581bd772835d92954e2
    38 Ne502bbd5377e48fa9aa6ce11504d9f56
    39 sg:journal.1048878
    40 schema:keywords BackgroundThere
    41 ConclusionsOur results
    42 RNA-seq analysis
    43 abundance
    44 abundance analysis
    45 amplicons
    46 analysis
    47 analysis method
    48 analysis outcomes
    49 approach
    50 assignment
    51 available methods
    52 bacterial communities
    53 balance
    54 benchmarking
    55 benchmarking framework
    56 better detection power
    57 bias
    58 borrowing
    59 caution
    60 characteristics
    61 choice
    62 choice of method
    63 classical statistical approach
    64 common approach
    65 community
    66 conditions
    67 control assignment
    68 control balance
    69 control proportions
    70 count
    71 data analysis methods
    72 data characteristics
    73 dataset
    74 depth variation
    75 design
    76 detection power
    77 differences
    78 different sequencing platforms
    79 different sources
    80 differential relative abundance
    81 discovery
    82 disease
    83 distance metric
    84 distribution
    85 effect
    86 effect size
    87 efforts
    88 extensible framework
    89 factors
    90 false discoveries
    91 false positive rate
    92 field
    93 framework
    94 health
    95 high false positive rate
    96 high-throughput sequencing
    97 human microbiome
    98 human physiology
    99 hypervariable region
    100 immense scientific interest
    101 important factor
    102 inference
    103 interest
    104 large datasets
    105 large differences
    106 large-scale benchmarking
    107 little effect
    108 low false positive rate
    109 method
    110 method performance
    111 metrics
    112 microbiome
    113 microbiome datasets
    114 microbiome studies
    115 new method
    116 nonconformity
    117 normalization
    118 operational taxonomic units
    119 outcomes
    120 output
    121 parameters
    122 performance
    123 physiology
    124 platform
    125 plethora
    126 positive rate
    127 power
    128 proportion
    129 rRNA gene hypervariable regions
    130 range
    131 range of parameters
    132 rate
    133 real datasets
    134 recommendations
    135 region
    136 relative abundance
    137 relative abundance analysis
    138 results
    139 retrieval power
    140 retrieval rate
    141 robustness
    142 sample conditions
    143 sample separation
    144 scientific interest
    145 sensitivity
    146 separation
    147 separation power
    148 sequencing
    149 sequencing platforms
    150 size
    151 size normalization
    152 source
    153 sparsity
    154 statistical approach
    155 strategies
    156 study
    157 study design
    158 taxonomic units
    159 terms
    160 test
    161 theoretical distributions
    162 tool
    163 transformation sensitivity
    164 unbalanced study design
    165 unique data characteristics
    166 units
    167 variation
    168 schema:name Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies
    169 schema:pagination 62
    170 schema:productId N1549974c2d484ed497fc8a28bcdd33c9
    171 N7e89d023f0474cbc95d42cd6edb65e86
    172 Nf372e5eb4ceb48a19e3e27b202aa6240
    173 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019428991
    174 https://doi.org/10.1186/s40168-016-0208-8
    175 schema:sdDatePublished 2022-08-04T17:03
    176 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    177 schema:sdPublisher N889e8ff6f318488fac591ea34ea698a7
    178 schema:url https://doi.org/10.1186/s40168-016-0208-8
    179 sgo:license sg:explorer/license/
    180 sgo:sdDataset articles
    181 rdf:type schema:ScholarlyArticle
    182 N0319c68898f844da9797c4378fd900db rdf:first sg:person.013630367571.45
    183 rdf:rest N5ba7e1620a1e4f5eb0e4a590641a50de
    184 N1549974c2d484ed497fc8a28bcdd33c9 schema:name pubmed_id
    185 schema:value 27884206
    186 rdf:type schema:PropertyValue
    187 N1f32bd96bc774a28b8d661ceea92ef0b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    188 schema:name False Positive Reactions
    189 rdf:type schema:DefinedTerm
    190 N3d1e10fb9d5c467cad3fb7b345723355 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    191 schema:name Microbiota
    192 rdf:type schema:DefinedTerm
    193 N45323f23801047a38608c94f34927052 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    194 schema:name Base Sequence
    195 rdf:type schema:DefinedTerm
    196 N4b2e6dcc97b64a39a3a3cb0a95ed622c rdf:first sg:person.0702056330.02
    197 rdf:rest N0319c68898f844da9797c4378fd900db
    198 N4ed20ed3fe374ac08d0ee7cd9527d308 rdf:first sg:person.0770304772.43
    199 rdf:rest N535c3a317a874ccb899065c35f89ca59
    200 N535c3a317a874ccb899065c35f89ca59 rdf:first sg:person.0713540153.74
    201 rdf:rest Neddb0f064b554bf686f54a11b272a8a4
    202 N55c5d0e246974d5da6a48a5af0bb1026 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    203 schema:name Humans
    204 rdf:type schema:DefinedTerm
    205 N586cb6b3f8644e9db75d4a8b3c2b72e8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    206 schema:name Sequence Analysis, RNA
    207 rdf:type schema:DefinedTerm
    208 N5ba7e1620a1e4f5eb0e4a590641a50de rdf:first sg:person.01013242545.03
    209 rdf:rest N6a23ee3b41d6445c91527d5e92395665
    210 N634642c4b8274501a6d99ed0c34f9671 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    211 schema:name Benchmarking
    212 rdf:type schema:DefinedTerm
    213 N6a23ee3b41d6445c91527d5e92395665 rdf:first sg:person.0733166761.32
    214 rdf:rest N6fc13d59ddd14d01b355c6227d233406
    215 N6fc13d59ddd14d01b355c6227d233406 rdf:first sg:person.0610362715.16
    216 rdf:rest N4ed20ed3fe374ac08d0ee7cd9527d308
    217 N78929cf2ef7c4de3ba2d920178d4ca92 rdf:first sg:person.014631160175.76
    218 rdf:rest N4b2e6dcc97b64a39a3a3cb0a95ed622c
    219 N7e89d023f0474cbc95d42cd6edb65e86 schema:name dimensions_id
    220 schema:value pub.1019428991
    221 rdf:type schema:PropertyValue
    222 N889e8ff6f318488fac591ea34ea698a7 schema:name Springer Nature - SN SciGraph project
    223 rdf:type schema:Organization
    224 N9b1609dffa2f49b5b4eb2edc3ffba270 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    225 schema:name High-Throughput Nucleotide Sequencing
    226 rdf:type schema:DefinedTerm
    227 N9d67f19e6d214581bd772835d92954e2 schema:volumeNumber 4
    228 rdf:type schema:PublicationVolume
    229 Nd44a74aa8364489a8561d17b744cb9f1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    230 schema:name Bacteria
    231 rdf:type schema:DefinedTerm
    232 Ne13f292a75a146aeb12f32aec9395708 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    233 schema:name RNA, Ribosomal, 16S
    234 rdf:type schema:DefinedTerm
    235 Ne502bbd5377e48fa9aa6ce11504d9f56 schema:issueNumber 1
    236 rdf:type schema:PublicationIssue
    237 Ne633e59448004240807e6aff51917095 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    238 schema:name Case-Control Studies
    239 rdf:type schema:DefinedTerm
    240 Neddb0f064b554bf686f54a11b272a8a4 rdf:first sg:person.0701562305.05
    241 rdf:rest rdf:nil
    242 Nf372e5eb4ceb48a19e3e27b202aa6240 schema:name doi
    243 schema:value 10.1186/s40168-016-0208-8
    244 rdf:type schema:PropertyValue
    245 Nf81ac9f8d23f4e8fbb60ab6bff273c37 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    246 schema:name Computational Biology
    247 rdf:type schema:DefinedTerm
    248 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    249 schema:name Biological Sciences
    250 rdf:type schema:DefinedTerm
    251 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    252 schema:name Genetics
    253 rdf:type schema:DefinedTerm
    254 sg:journal.1048878 schema:issn 2049-2618
    255 schema:name Microbiome
    256 schema:publisher Springer Nature
    257 rdf:type schema:Periodical
    258 sg:person.01013242545.03 schema:affiliation grid-institutes:grid.5254.6
    259 schema:familyName Rasmussen
    260 schema:givenName Morten A.
    261 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01013242545.03
    262 rdf:type schema:Person
    263 sg:person.013630367571.45 schema:affiliation grid-institutes:grid.5254.6
    264 schema:familyName Mortensen
    265 schema:givenName Martin
    266 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013630367571.45
    267 rdf:type schema:Person
    268 sg:person.014631160175.76 schema:affiliation grid-institutes:grid.5254.6
    269 schema:familyName Thorsen
    270 schema:givenName Jonathan
    271 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014631160175.76
    272 rdf:type schema:Person
    273 sg:person.0610362715.16 schema:affiliation grid-institutes:grid.5254.6
    274 schema:familyName Al-Soud
    275 schema:givenName Waleed Abu
    276 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0610362715.16
    277 rdf:type schema:Person
    278 sg:person.0701562305.05 schema:affiliation grid-institutes:grid.5254.6
    279 schema:familyName Waage
    280 schema:givenName Johannes
    281 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0701562305.05
    282 rdf:type schema:Person
    283 sg:person.0702056330.02 schema:affiliation grid-institutes:grid.5254.6
    284 schema:familyName Brejnrod
    285 schema:givenName Asker
    286 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0702056330.02
    287 rdf:type schema:Person
    288 sg:person.0713540153.74 schema:affiliation grid-institutes:grid.5254.6
    289 schema:familyName Bisgaard
    290 schema:givenName Hans
    291 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0713540153.74
    292 rdf:type schema:Person
    293 sg:person.0733166761.32 schema:affiliation grid-institutes:grid.5254.6
    294 schema:familyName Stokholm
    295 schema:givenName Jakob
    296 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0733166761.32
    297 rdf:type schema:Person
    298 sg:person.0770304772.43 schema:affiliation grid-institutes:grid.5254.6
    299 schema:familyName Sørensen
    300 schema:givenName Søren
    301 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770304772.43
    302 rdf:type schema:Person
    303 sg:pub.10.1007/978-0-387-21706-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035613449
    304 https://doi.org/10.1007/978-0-387-21706-2
    305 rdf:type schema:CreativeWork
    306 sg:pub.10.1007/978-0-387-87458-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023139038
    307 https://doi.org/10.1007/978-0-387-87458-6
    308 rdf:type schema:CreativeWork
    309 sg:pub.10.1007/978-0-387-98141-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041188628
    310 https://doi.org/10.1007/978-0-387-98141-3
    311 rdf:type schema:CreativeWork
    312 sg:pub.10.1038/nature08821 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050498034
    313 https://doi.org/10.1038/nature08821
    314 rdf:type schema:CreativeWork
    315 sg:pub.10.1038/nature09944 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026204536
    316 https://doi.org/10.1038/nature09944
    317 rdf:type schema:CreativeWork
    318 sg:pub.10.1038/nature11234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007740093
    319 https://doi.org/10.1038/nature11234
    320 rdf:type schema:CreativeWork
    321 sg:pub.10.1038/nmeth.2658 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002139060
    322 https://doi.org/10.1038/nmeth.2658
    323 rdf:type schema:CreativeWork
    324 sg:pub.10.1038/nmeth.2897 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030185276
    325 https://doi.org/10.1038/nmeth.2897
    326 rdf:type schema:CreativeWork
    327 sg:pub.10.1038/nmeth.2898 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007964999
    328 https://doi.org/10.1038/nmeth.2898
    329 rdf:type schema:CreativeWork
    330 sg:pub.10.1186/1471-2105-11-422 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047456674
    331 https://doi.org/10.1186/1471-2105-11-422
    332 rdf:type schema:CreativeWork
    333 sg:pub.10.1186/1471-2105-12-77 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014582441
    334 https://doi.org/10.1186/1471-2105-12-77
    335 rdf:type schema:CreativeWork
    336 sg:pub.10.1186/2047-217x-1-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050567563
    337 https://doi.org/10.1186/2047-217x-1-7
    338 rdf:type schema:CreativeWork
    339 sg:pub.10.1186/2049-2618-2-11 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031538238
    340 https://doi.org/10.1186/2049-2618-2-11
    341 rdf:type schema:CreativeWork
    342 sg:pub.10.1186/2049-2618-2-15 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046874717
    343 https://doi.org/10.1186/2049-2618-2-15
    344 rdf:type schema:CreativeWork
    345 sg:pub.10.1186/gb-2010-11-10-r106 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031289083
    346 https://doi.org/10.1186/gb-2010-11-10-r106
    347 rdf:type schema:CreativeWork
    348 sg:pub.10.1186/gb-2014-15-6-r76 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024439294
    349 https://doi.org/10.1186/gb-2014-15-6-r76
    350 rdf:type schema:CreativeWork
    351 sg:pub.10.1186/s13059-014-0550-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015222646
    352 https://doi.org/10.1186/s13059-014-0550-8
    353 rdf:type schema:CreativeWork
    354 grid-institutes:grid.5254.6 schema:alternateName COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
    355 Department of Biology, Laboratory of Genomics and Molecular Biomedicine, University of Copenhagen, Copenhagen, Denmark
    356 Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
    357 schema:name COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
    358 Department of Biology, Laboratory of Genomics and Molecular Biomedicine, University of Copenhagen, Copenhagen, Denmark
    359 Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
    360 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...