Analysis and correction of compositional bias in sparse sequencing count data View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-11-06

AUTHORS

M. Senthil Kumar, Eric V. Slud, Kwame Okrah, Stephanie C. Hicks, Sridhar Hannenhalli, Héctor Corrada Bravo

ABSTRACT

BackgroundCount data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size.ResultsWe demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.ConclusionsCompositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed. More... »

PAGES

799

References to SciGraph publications

  • 2017-10-02. Towards standards for human fecal sample processing in metagenomic studies in NATURE BIOTECHNOLOGY
  • 2012-12-03. DNA extract characterization process for microbial detection methods development and validation in BMC RESEARCH NOTES
  • 2012-06-13. Structure, function and diversity of the healthy human microbiome in NATURE
  • 2012-12-28. Considerations for the development and application of control materials to improve metagenomic microbial community profiling in ACCREDITATION AND QUALITY ASSURANCE
  • 2009-01. RNA-Seq: a revolutionary tool for transcriptomics in NATURE REVIEWS GENETICS
  • 2009-09-08. ChIP–seq: advantages and challenges of a maturing technology in NATURE REVIEWS GENETICS
  • 2012-07-16. Microbial interactions: from networks to models in NATURE REVIEWS MICROBIOLOGY
  • 2015-03-21. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies in BMC MICROBIOLOGY
  • 2014-12-05. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 in GENOME BIOLOGY
  • 2016-04-27. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts in GENOME BIOLOGY
  • 2014-11-12. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses in BMC BIOLOGY
  • 2009-04-16. Transcript length bias in RNA-seq data confounds systems biology in BIOLOGY DIRECT
  • 2010-03-02. A scaling normalization method for differential expression analysis of RNA-seq data in GENOME BIOLOGY
  • 2014-06-27. Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition in GENOME BIOLOGY
  • 2010-02-18. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments in BMC BIOINFORMATICS
  • 2008-10-09. Next-generation DNA sequencing in NATURE BIOTECHNOLOGY
  • 2010-10-27. Differential expression analysis for sequence count data in GENOME BIOLOGY
  • 2016-01-14. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling in BMC GENOMICS
  • 2014-02-03. voom: precision weights unlock linear model analysis tools for RNA-seq read counts in GENOME BIOLOGY
  • 2008-06-29. Aerobic production of methane in the sea in NATURE GEOSCIENCE
  • 2014-05-05. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis in MICROBIOME
  • 2015-01-28. Computational and analytical challenges in single-cell transcriptomics in NATURE REVIEWS GENETICS
  • 2011-02-09. A decade’s perspective on DNA sequencing technology in NATURE
  • 2013-12-22. Quantitative single-cell RNA-seq with unique molecular identifiers in NATURE METHODS
  • 2013-09-29. Differential abundance analysis for microbial marker-gene surveys in NATURE METHODS
  • 2014-02-10. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages in NATURE COMMUNICATIONS
  • 2005-10-11. Metagenomics: DNA sequencing of environmental samples in NATURE REVIEWS GENETICS
  • 2010-02-04. Gene ontology analysis for RNA-seq: accounting for selection bias in GENOME BIOLOGY
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5

    DOI

    http://dx.doi.org/10.1186/s12864-018-5160-5

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1108058115

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/30400812


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/11", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Medical and Health Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Algorithms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Bayes Theorem", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computational Biology", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "High-Throughput Nucleotide Sequencing", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Metagenomics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Microbiota", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA, Ribosomal, 16S", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.164295.d", 
              "name": [
                "Graduate Program in Bioinformatics, University of Maryland, College Park, MD, USA", 
                "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kumar", 
            "givenName": "M. Senthil", 
            "id": "sg:person.0634642307.58", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0634642307.58"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Center for Statistical Research and Methodology, U.S Census Bureau, Suitland, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.432923.d", 
              "name": [
                "Department of Mathematics, University of Maryland, College Park, MD, USA", 
                "Center for Statistical Research and Methodology, U.S Census Bureau, Suitland, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Slud", 
            "givenName": "Eric V.", 
            "id": "sg:person.012060167025.63", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012060167025.63"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "GRED Oncology Biostatistics, Genentech, San Francisco, CA, USA", 
              "id": "http://www.grid.ac/institutes/grid.418158.1", 
              "name": [
                "GRED Oncology Biostatistics, Genentech, San Francisco, CA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Okrah", 
            "givenName": "Kwame", 
            "id": "sg:person.0744073663.77", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0744073663.77"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.38142.3c", 
              "name": [
                "Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard University, Boston, MA, USA", 
                "Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hicks", 
            "givenName": "Stephanie C.", 
            "id": "sg:person.01136367606.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01136367606.02"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.164295.d", 
              "name": [
                "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hannenhalli", 
            "givenName": "Sridhar", 
            "id": "sg:person.01341565477.18", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01341565477.18"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.164295.d", 
              "name": [
                "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Corrada Bravo", 
            "givenName": "H\u00e9ctor", 
            "id": "sg:person.0706015450.76", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0706015450.76"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1186/s13059-014-0550-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015222646", 
              "https://doi.org/10.1186/s13059-014-0550-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s12864-015-2194-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027095113", 
              "https://doi.org/10.1186/s12864-015-2194-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg3833", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004107723", 
              "https://doi.org/10.1038/nrg3833"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.3960", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1092055967", 
              "https://doi.org/10.1038/nbt.3960"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-3-r25", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050509557", 
              "https://doi.org/10.1186/gb-2010-11-3-r25"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature11234", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007740093", 
              "https://doi.org/10.1038/nature11234"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrmicro2832", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030236624", 
              "https://doi.org/10.1038/nrmicro2832"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1745-6150-4-14", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045373440", 
              "https://doi.org/10.1186/1745-6150-4-14"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s00769-012-0941-z", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1012638035", 
              "https://doi.org/10.1007/s00769-012-0941-z"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg1709", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017719492", 
              "https://doi.org/10.1038/nrg1709"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-2-r14", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050171830", 
              "https://doi.org/10.1186/gb-2010-11-2-r14"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1756-0500-5-668", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032127979", 
              "https://doi.org/10.1186/1756-0500-5-668"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg2484", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030687647", 
              "https://doi.org/10.1038/nrg2484"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt1486", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1005954516", 
              "https://doi.org/10.1038/nbt1486"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-10-r106", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031289083", 
              "https://doi.org/10.1186/gb-2010-11-10-r106"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s12866-015-0351-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1001744296", 
              "https://doi.org/10.1186/s12866-015-0351-6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2014-15-2-r29", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045312009", 
              "https://doi.org/10.1186/gb-2014-15-2-r29"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg2641", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006115199", 
              "https://doi.org/10.1038/nrg2641"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-11-94", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1053091615", 
              "https://doi.org/10.1186/1471-2105-11-94"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ngeo234", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022547638", 
              "https://doi.org/10.1038/ngeo234"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/2049-2618-2-15", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046874717", 
              "https://doi.org/10.1186/2049-2618-2-15"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s12915-014-0087-z", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027737035", 
              "https://doi.org/10.1186/s12915-014-0087-z"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2014-15-6-r76", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1024439294", 
              "https://doi.org/10.1186/gb-2014-15-6-r76"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2772", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029706604", 
              "https://doi.org/10.1038/nmeth.2772"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ncomms4230", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044984586", 
              "https://doi.org/10.1038/ncomms4230"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2658", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002139060", 
              "https://doi.org/10.1038/nmeth.2658"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13059-016-0947-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000759088", 
              "https://doi.org/10.1186/s13059-016-0947-7"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature09796", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049651011", 
              "https://doi.org/10.1038/nature09796"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-11-06", 
        "datePublishedReg": "2018-11-06", 
        "description": "BackgroundCount data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size.ResultsWe demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.ConclusionsCompositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/s12864-018-5160-5", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.2529382", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2564648", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.7519284", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.5300904", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2612238", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2521853", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.4312567", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.4242377", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.6501720", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2519831", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1023790", 
            "issn": [
              "1471-2164"
            ], 
            "name": "BMC Genomics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "19"
          }
        ], 
        "keywords": [
          "count data", 
          "inference", 
          "bias correction", 
          "bias estimates", 
          "normalization approach", 
          "subsampling", 
          "normalization method", 
          "normalization technique", 
          "improved performance", 
          "approach", 
          "problem", 
          "assumption", 
          "correction", 
          "detailed experiments", 
          "technique", 
          "estimates", 
          "properties", 
          "bias", 
          "library size", 
          "distribution", 
          "data", 
          "machine", 
          "sequencing process", 
          "performance", 
          "dataset", 
          "experiments", 
          "features", 
          "size", 
          "sequencing machines", 
          "analysis", 
          "process", 
          "absolute abundance", 
          "technical bias", 
          "compositional bias", 
          "survey data", 
          "influence", 
          "quantitative molecular assays", 
          "addition", 
          "light", 
          "abundance", 
          "Scale-16", 
          "metagenomics", 
          "molecular assays", 
          "method", 
          "sequencing", 
          "assays", 
          "data normalization approaches", 
          "ResultsWe"
        ], 
        "name": "Analysis and correction of compositional bias in sparse sequencing count data", 
        "pagination": "799", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1108058115"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s12864-018-5160-5"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "30400812"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s12864-018-5160-5", 
          "https://app.dimensions.ai/details/publication/pub.1108058115"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-10-01T06:45", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/article/article_782.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/s12864-018-5160-5"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5'


     

    This table displays all metadata directly associated to this object as RDF triples.

    319 TRIPLES      21 PREDICATES      109 URIs      72 LITERALS      14 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s12864-018-5160-5 schema:about N105f7cc2f8dd4633b03208f8a1c29efa
    2 N16b50f1532e34c69b325d7782a206f68
    3 N2246530498564c5fa3009868082b90ca
    4 N2a5ac203874c478693b661634c7a6fda
    5 N3f896581d5e649e98bf89e4bacaf770e
    6 N6bc5699173d646c5959fc106c152702a
    7 Na72cf856f7b44a8fac20064b273a40e9
    8 anzsrc-for:06
    9 anzsrc-for:08
    10 anzsrc-for:11
    11 schema:author Na60ba521aba84f17ab90a63c7a2a7832
    12 schema:citation sg:pub.10.1007/s00769-012-0941-z
    13 sg:pub.10.1038/nature09796
    14 sg:pub.10.1038/nature11234
    15 sg:pub.10.1038/nbt.3960
    16 sg:pub.10.1038/nbt1486
    17 sg:pub.10.1038/ncomms4230
    18 sg:pub.10.1038/ngeo234
    19 sg:pub.10.1038/nmeth.2658
    20 sg:pub.10.1038/nmeth.2772
    21 sg:pub.10.1038/nrg1709
    22 sg:pub.10.1038/nrg2484
    23 sg:pub.10.1038/nrg2641
    24 sg:pub.10.1038/nrg3833
    25 sg:pub.10.1038/nrmicro2832
    26 sg:pub.10.1186/1471-2105-11-94
    27 sg:pub.10.1186/1745-6150-4-14
    28 sg:pub.10.1186/1756-0500-5-668
    29 sg:pub.10.1186/2049-2618-2-15
    30 sg:pub.10.1186/gb-2010-11-10-r106
    31 sg:pub.10.1186/gb-2010-11-2-r14
    32 sg:pub.10.1186/gb-2010-11-3-r25
    33 sg:pub.10.1186/gb-2014-15-2-r29
    34 sg:pub.10.1186/gb-2014-15-6-r76
    35 sg:pub.10.1186/s12864-015-2194-9
    36 sg:pub.10.1186/s12866-015-0351-6
    37 sg:pub.10.1186/s12915-014-0087-z
    38 sg:pub.10.1186/s13059-014-0550-8
    39 sg:pub.10.1186/s13059-016-0947-7
    40 schema:datePublished 2018-11-06
    41 schema:datePublishedReg 2018-11-06
    42 schema:description BackgroundCount data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size.ResultsWe demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.ConclusionsCompositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.
    43 schema:genre article
    44 schema:isAccessibleForFree true
    45 schema:isPartOf N0c4dc96f1ff341f7bc6f5075f967d189
    46 N346647877df341f5b6031aeb2a02341f
    47 sg:journal.1023790
    48 schema:keywords ResultsWe
    49 Scale-16
    50 absolute abundance
    51 abundance
    52 addition
    53 analysis
    54 approach
    55 assays
    56 assumption
    57 bias
    58 bias correction
    59 bias estimates
    60 compositional bias
    61 correction
    62 count data
    63 data
    64 data normalization approaches
    65 dataset
    66 detailed experiments
    67 distribution
    68 estimates
    69 experiments
    70 features
    71 improved performance
    72 inference
    73 influence
    74 library size
    75 light
    76 machine
    77 metagenomics
    78 method
    79 molecular assays
    80 normalization approach
    81 normalization method
    82 normalization technique
    83 performance
    84 problem
    85 process
    86 properties
    87 quantitative molecular assays
    88 sequencing
    89 sequencing machines
    90 sequencing process
    91 size
    92 subsampling
    93 survey data
    94 technical bias
    95 technique
    96 schema:name Analysis and correction of compositional bias in sparse sequencing count data
    97 schema:pagination 799
    98 schema:productId N73810f5fabca4e8d86af8e84e4e576d7
    99 N8cb9bad89b574e9e907c50353df7bf49
    100 Nf8dd59a8ba12408b97d70a55cd0a261a
    101 schema:sameAs https://app.dimensions.ai/details/publication/pub.1108058115
    102 https://doi.org/10.1186/s12864-018-5160-5
    103 schema:sdDatePublished 2022-10-01T06:45
    104 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    105 schema:sdPublisher N0e3efc7e07f346569ea20995981ca2d9
    106 schema:url https://doi.org/10.1186/s12864-018-5160-5
    107 sgo:license sg:explorer/license/
    108 sgo:sdDataset articles
    109 rdf:type schema:ScholarlyArticle
    110 N0c4dc96f1ff341f7bc6f5075f967d189 schema:issueNumber 1
    111 rdf:type schema:PublicationIssue
    112 N0e3efc7e07f346569ea20995981ca2d9 schema:name Springer Nature - SN SciGraph project
    113 rdf:type schema:Organization
    114 N105f7cc2f8dd4633b03208f8a1c29efa schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    115 schema:name Algorithms
    116 rdf:type schema:DefinedTerm
    117 N16b50f1532e34c69b325d7782a206f68 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    118 schema:name High-Throughput Nucleotide Sequencing
    119 rdf:type schema:DefinedTerm
    120 N2246530498564c5fa3009868082b90ca schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    121 schema:name Computational Biology
    122 rdf:type schema:DefinedTerm
    123 N2a5ac203874c478693b661634c7a6fda schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    124 schema:name Microbiota
    125 rdf:type schema:DefinedTerm
    126 N343d58fda24e458fa49fe52f71c6dadc rdf:first sg:person.0744073663.77
    127 rdf:rest Nede51bc9bb944de295db849e20d6d7e2
    128 N346647877df341f5b6031aeb2a02341f schema:volumeNumber 19
    129 rdf:type schema:PublicationVolume
    130 N3a767c9384f54408bcd9d7a22c75fcfe rdf:first sg:person.01341565477.18
    131 rdf:rest Na307bf5b71534cfd93d95c7110c55b46
    132 N3f896581d5e649e98bf89e4bacaf770e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    133 schema:name Metagenomics
    134 rdf:type schema:DefinedTerm
    135 N6bc5699173d646c5959fc106c152702a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    136 schema:name Bayes Theorem
    137 rdf:type schema:DefinedTerm
    138 N73810f5fabca4e8d86af8e84e4e576d7 schema:name pubmed_id
    139 schema:value 30400812
    140 rdf:type schema:PropertyValue
    141 N8cb9bad89b574e9e907c50353df7bf49 schema:name dimensions_id
    142 schema:value pub.1108058115
    143 rdf:type schema:PropertyValue
    144 Na307bf5b71534cfd93d95c7110c55b46 rdf:first sg:person.0706015450.76
    145 rdf:rest rdf:nil
    146 Na60ba521aba84f17ab90a63c7a2a7832 rdf:first sg:person.0634642307.58
    147 rdf:rest Nbf0e3ba9472b4e4ab0babf29c4b61839
    148 Na72cf856f7b44a8fac20064b273a40e9 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    149 schema:name RNA, Ribosomal, 16S
    150 rdf:type schema:DefinedTerm
    151 Nbf0e3ba9472b4e4ab0babf29c4b61839 rdf:first sg:person.012060167025.63
    152 rdf:rest N343d58fda24e458fa49fe52f71c6dadc
    153 Nede51bc9bb944de295db849e20d6d7e2 rdf:first sg:person.01136367606.02
    154 rdf:rest N3a767c9384f54408bcd9d7a22c75fcfe
    155 Nf8dd59a8ba12408b97d70a55cd0a261a schema:name doi
    156 schema:value 10.1186/s12864-018-5160-5
    157 rdf:type schema:PropertyValue
    158 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    159 schema:name Biological Sciences
    160 rdf:type schema:DefinedTerm
    161 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    162 schema:name Information and Computing Sciences
    163 rdf:type schema:DefinedTerm
    164 anzsrc-for:11 schema:inDefinedTermSet anzsrc-for:
    165 schema:name Medical and Health Sciences
    166 rdf:type schema:DefinedTerm
    167 sg:grant.2519831 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    168 rdf:type schema:MonetaryGrant
    169 sg:grant.2521853 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    170 rdf:type schema:MonetaryGrant
    171 sg:grant.2529382 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    172 rdf:type schema:MonetaryGrant
    173 sg:grant.2564648 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    174 rdf:type schema:MonetaryGrant
    175 sg:grant.2612238 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    176 rdf:type schema:MonetaryGrant
    177 sg:grant.4242377 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    178 rdf:type schema:MonetaryGrant
    179 sg:grant.4312567 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    180 rdf:type schema:MonetaryGrant
    181 sg:grant.5300904 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    182 rdf:type schema:MonetaryGrant
    183 sg:grant.6501720 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    184 rdf:type schema:MonetaryGrant
    185 sg:grant.7519284 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    186 rdf:type schema:MonetaryGrant
    187 sg:journal.1023790 schema:issn 1471-2164
    188 schema:name BMC Genomics
    189 schema:publisher Springer Nature
    190 rdf:type schema:Periodical
    191 sg:person.01136367606.02 schema:affiliation grid-institutes:grid.38142.3c
    192 schema:familyName Hicks
    193 schema:givenName Stephanie C.
    194 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01136367606.02
    195 rdf:type schema:Person
    196 sg:person.012060167025.63 schema:affiliation grid-institutes:grid.432923.d
    197 schema:familyName Slud
    198 schema:givenName Eric V.
    199 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012060167025.63
    200 rdf:type schema:Person
    201 sg:person.01341565477.18 schema:affiliation grid-institutes:grid.164295.d
    202 schema:familyName Hannenhalli
    203 schema:givenName Sridhar
    204 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01341565477.18
    205 rdf:type schema:Person
    206 sg:person.0634642307.58 schema:affiliation grid-institutes:grid.164295.d
    207 schema:familyName Kumar
    208 schema:givenName M. Senthil
    209 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0634642307.58
    210 rdf:type schema:Person
    211 sg:person.0706015450.76 schema:affiliation grid-institutes:grid.164295.d
    212 schema:familyName Corrada Bravo
    213 schema:givenName Héctor
    214 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0706015450.76
    215 rdf:type schema:Person
    216 sg:person.0744073663.77 schema:affiliation grid-institutes:grid.418158.1
    217 schema:familyName Okrah
    218 schema:givenName Kwame
    219 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0744073663.77
    220 rdf:type schema:Person
    221 sg:pub.10.1007/s00769-012-0941-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1012638035
    222 https://doi.org/10.1007/s00769-012-0941-z
    223 rdf:type schema:CreativeWork
    224 sg:pub.10.1038/nature09796 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049651011
    225 https://doi.org/10.1038/nature09796
    226 rdf:type schema:CreativeWork
    227 sg:pub.10.1038/nature11234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007740093
    228 https://doi.org/10.1038/nature11234
    229 rdf:type schema:CreativeWork
    230 sg:pub.10.1038/nbt.3960 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092055967
    231 https://doi.org/10.1038/nbt.3960
    232 rdf:type schema:CreativeWork
    233 sg:pub.10.1038/nbt1486 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005954516
    234 https://doi.org/10.1038/nbt1486
    235 rdf:type schema:CreativeWork
    236 sg:pub.10.1038/ncomms4230 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044984586
    237 https://doi.org/10.1038/ncomms4230
    238 rdf:type schema:CreativeWork
    239 sg:pub.10.1038/ngeo234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022547638
    240 https://doi.org/10.1038/ngeo234
    241 rdf:type schema:CreativeWork
    242 sg:pub.10.1038/nmeth.2658 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002139060
    243 https://doi.org/10.1038/nmeth.2658
    244 rdf:type schema:CreativeWork
    245 sg:pub.10.1038/nmeth.2772 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029706604
    246 https://doi.org/10.1038/nmeth.2772
    247 rdf:type schema:CreativeWork
    248 sg:pub.10.1038/nrg1709 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017719492
    249 https://doi.org/10.1038/nrg1709
    250 rdf:type schema:CreativeWork
    251 sg:pub.10.1038/nrg2484 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030687647
    252 https://doi.org/10.1038/nrg2484
    253 rdf:type schema:CreativeWork
    254 sg:pub.10.1038/nrg2641 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006115199
    255 https://doi.org/10.1038/nrg2641
    256 rdf:type schema:CreativeWork
    257 sg:pub.10.1038/nrg3833 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004107723
    258 https://doi.org/10.1038/nrg3833
    259 rdf:type schema:CreativeWork
    260 sg:pub.10.1038/nrmicro2832 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030236624
    261 https://doi.org/10.1038/nrmicro2832
    262 rdf:type schema:CreativeWork
    263 sg:pub.10.1186/1471-2105-11-94 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053091615
    264 https://doi.org/10.1186/1471-2105-11-94
    265 rdf:type schema:CreativeWork
    266 sg:pub.10.1186/1745-6150-4-14 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045373440
    267 https://doi.org/10.1186/1745-6150-4-14
    268 rdf:type schema:CreativeWork
    269 sg:pub.10.1186/1756-0500-5-668 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032127979
    270 https://doi.org/10.1186/1756-0500-5-668
    271 rdf:type schema:CreativeWork
    272 sg:pub.10.1186/2049-2618-2-15 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046874717
    273 https://doi.org/10.1186/2049-2618-2-15
    274 rdf:type schema:CreativeWork
    275 sg:pub.10.1186/gb-2010-11-10-r106 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031289083
    276 https://doi.org/10.1186/gb-2010-11-10-r106
    277 rdf:type schema:CreativeWork
    278 sg:pub.10.1186/gb-2010-11-2-r14 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050171830
    279 https://doi.org/10.1186/gb-2010-11-2-r14
    280 rdf:type schema:CreativeWork
    281 sg:pub.10.1186/gb-2010-11-3-r25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050509557
    282 https://doi.org/10.1186/gb-2010-11-3-r25
    283 rdf:type schema:CreativeWork
    284 sg:pub.10.1186/gb-2014-15-2-r29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045312009
    285 https://doi.org/10.1186/gb-2014-15-2-r29
    286 rdf:type schema:CreativeWork
    287 sg:pub.10.1186/gb-2014-15-6-r76 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024439294
    288 https://doi.org/10.1186/gb-2014-15-6-r76
    289 rdf:type schema:CreativeWork
    290 sg:pub.10.1186/s12864-015-2194-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027095113
    291 https://doi.org/10.1186/s12864-015-2194-9
    292 rdf:type schema:CreativeWork
    293 sg:pub.10.1186/s12866-015-0351-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001744296
    294 https://doi.org/10.1186/s12866-015-0351-6
    295 rdf:type schema:CreativeWork
    296 sg:pub.10.1186/s12915-014-0087-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1027737035
    297 https://doi.org/10.1186/s12915-014-0087-z
    298 rdf:type schema:CreativeWork
    299 sg:pub.10.1186/s13059-014-0550-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015222646
    300 https://doi.org/10.1186/s13059-014-0550-8
    301 rdf:type schema:CreativeWork
    302 sg:pub.10.1186/s13059-016-0947-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000759088
    303 https://doi.org/10.1186/s13059-016-0947-7
    304 rdf:type schema:CreativeWork
    305 grid-institutes:grid.164295.d schema:alternateName Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
    306 schema:name Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
    307 Graduate Program in Bioinformatics, University of Maryland, College Park, MD, USA
    308 rdf:type schema:Organization
    309 grid-institutes:grid.38142.3c schema:alternateName Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
    310 schema:name Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard University, Boston, MA, USA
    311 Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
    312 rdf:type schema:Organization
    313 grid-institutes:grid.418158.1 schema:alternateName GRED Oncology Biostatistics, Genentech, San Francisco, CA, USA
    314 schema:name GRED Oncology Biostatistics, Genentech, San Francisco, CA, USA
    315 rdf:type schema:Organization
    316 grid-institutes:grid.432923.d schema:alternateName Center for Statistical Research and Methodology, U.S Census Bureau, Suitland, MD, USA
    317 schema:name Center for Statistical Research and Methodology, U.S Census Bureau, Suitland, MD, USA
    318 Department of Mathematics, University of Maryland, College Park, MD, USA
    319 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...