Analysis and correction of compositional bias in sparse sequencing count data View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-11-06

AUTHORS

M. Senthil Kumar, Eric V. Slud, Kwame Okrah, Stephanie C. Hicks, Sridhar Hannenhalli, Héctor Corrada Bravo

ABSTRACT

BackgroundCount data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size.ResultsWe demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.ConclusionsCompositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed. More... »

PAGES

799

References to SciGraph publications

  • 2017-10-02. Towards standards for human fecal sample processing in metagenomic studies in NATURE BIOTECHNOLOGY
  • 2012-12-03. DNA extract characterization process for microbial detection methods development and validation in BMC RESEARCH NOTES
  • 2012-06-13. Structure, function and diversity of the healthy human microbiome in NATURE
  • 2012-12-28. Considerations for the development and application of control materials to improve metagenomic microbial community profiling in ACCREDITATION AND QUALITY ASSURANCE
  • 2009-01. RNA-Seq: a revolutionary tool for transcriptomics in NATURE REVIEWS GENETICS
  • 2009-09-08. ChIP–seq: advantages and challenges of a maturing technology in NATURE REVIEWS GENETICS
  • 2012-07-16. Microbial interactions: from networks to models in NATURE REVIEWS MICROBIOLOGY
  • 2015-03-21. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies in BMC MICROBIOLOGY
  • 2014-12-05. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 in GENOME BIOLOGY
  • 2016-04-27. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts in GENOME BIOLOGY
  • 2014-11-12. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses in BMC BIOLOGY
  • 2009-04-16. Transcript length bias in RNA-seq data confounds systems biology in BIOLOGY DIRECT
  • 2010-03-02. A scaling normalization method for differential expression analysis of RNA-seq data in GENOME BIOLOGY
  • 2014-06-27. Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition in GENOME BIOLOGY
  • 2010-02-18. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments in BMC BIOINFORMATICS
  • 2008-10-09. Next-generation DNA sequencing in NATURE BIOTECHNOLOGY
  • 2010-10-27. Differential expression analysis for sequence count data in GENOME BIOLOGY
  • 2016-01-14. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling in BMC GENOMICS
  • 2014-02-03. voom: precision weights unlock linear model analysis tools for RNA-seq read counts in GENOME BIOLOGY
  • 2008-06-29. Aerobic production of methane in the sea in NATURE GEOSCIENCE
  • 2014-05-05. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis in MICROBIOME
  • 2015-01-28. Computational and analytical challenges in single-cell transcriptomics in NATURE REVIEWS GENETICS
  • 2011-02-09. A decade’s perspective on DNA sequencing technology in NATURE
  • 2013-12-22. Quantitative single-cell RNA-seq with unique molecular identifiers in NATURE METHODS
  • 2013-09-29. Differential abundance analysis for microbial marker-gene surveys in NATURE METHODS
  • 2014-02-10. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages in NATURE COMMUNICATIONS
  • 2005-10-11. Metagenomics: DNA sequencing of environmental samples in NATURE REVIEWS GENETICS
  • 2010-02-04. Gene ontology analysis for RNA-seq: accounting for selection bias in GENOME BIOLOGY
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5

    DOI

    http://dx.doi.org/10.1186/s12864-018-5160-5

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1108058115

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/30400812


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/11", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Medical and Health Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Algorithms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Bayes Theorem", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computational Biology", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "High-Throughput Nucleotide Sequencing", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Metagenomics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Microbiota", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA, Ribosomal, 16S", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.164295.d", 
              "name": [
                "Graduate Program in Bioinformatics, University of Maryland, College Park, MD, USA", 
                "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kumar", 
            "givenName": "M. Senthil", 
            "id": "sg:person.0634642307.58", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0634642307.58"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Center for Statistical Research and Methodology, U.S Census Bureau, Suitland, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.432923.d", 
              "name": [
                "Department of Mathematics, University of Maryland, College Park, MD, USA", 
                "Center for Statistical Research and Methodology, U.S Census Bureau, Suitland, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Slud", 
            "givenName": "Eric V.", 
            "id": "sg:person.012060167025.63", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012060167025.63"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "GRED Oncology Biostatistics, Genentech, San Francisco, CA, USA", 
              "id": "http://www.grid.ac/institutes/grid.418158.1", 
              "name": [
                "GRED Oncology Biostatistics, Genentech, San Francisco, CA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Okrah", 
            "givenName": "Kwame", 
            "id": "sg:person.0744073663.77", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0744073663.77"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.38142.3c", 
              "name": [
                "Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard University, Boston, MA, USA", 
                "Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hicks", 
            "givenName": "Stephanie C.", 
            "id": "sg:person.01136367606.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01136367606.02"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.164295.d", 
              "name": [
                "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hannenhalli", 
            "givenName": "Sridhar", 
            "id": "sg:person.01341565477.18", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01341565477.18"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.164295.d", 
              "name": [
                "Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Corrada Bravo", 
            "givenName": "H\u00e9ctor", 
            "id": "sg:person.0706015450.76", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0706015450.76"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1186/s13059-014-0550-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015222646", 
              "https://doi.org/10.1186/s13059-014-0550-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s12864-015-2194-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027095113", 
              "https://doi.org/10.1186/s12864-015-2194-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg3833", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004107723", 
              "https://doi.org/10.1038/nrg3833"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.3960", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1092055967", 
              "https://doi.org/10.1038/nbt.3960"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-3-r25", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050509557", 
              "https://doi.org/10.1186/gb-2010-11-3-r25"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature11234", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007740093", 
              "https://doi.org/10.1038/nature11234"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrmicro2832", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030236624", 
              "https://doi.org/10.1038/nrmicro2832"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1745-6150-4-14", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045373440", 
              "https://doi.org/10.1186/1745-6150-4-14"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s00769-012-0941-z", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1012638035", 
              "https://doi.org/10.1007/s00769-012-0941-z"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg1709", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017719492", 
              "https://doi.org/10.1038/nrg1709"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-2-r14", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050171830", 
              "https://doi.org/10.1186/gb-2010-11-2-r14"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1756-0500-5-668", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032127979", 
              "https://doi.org/10.1186/1756-0500-5-668"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg2484", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030687647", 
              "https://doi.org/10.1038/nrg2484"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt1486", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1005954516", 
              "https://doi.org/10.1038/nbt1486"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-10-r106", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031289083", 
              "https://doi.org/10.1186/gb-2010-11-10-r106"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s12866-015-0351-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1001744296", 
              "https://doi.org/10.1186/s12866-015-0351-6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2014-15-2-r29", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045312009", 
              "https://doi.org/10.1186/gb-2014-15-2-r29"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg2641", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006115199", 
              "https://doi.org/10.1038/nrg2641"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-11-94", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1053091615", 
              "https://doi.org/10.1186/1471-2105-11-94"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ngeo234", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022547638", 
              "https://doi.org/10.1038/ngeo234"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/2049-2618-2-15", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046874717", 
              "https://doi.org/10.1186/2049-2618-2-15"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s12915-014-0087-z", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027737035", 
              "https://doi.org/10.1186/s12915-014-0087-z"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2014-15-6-r76", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1024439294", 
              "https://doi.org/10.1186/gb-2014-15-6-r76"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2772", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029706604", 
              "https://doi.org/10.1038/nmeth.2772"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ncomms4230", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044984586", 
              "https://doi.org/10.1038/ncomms4230"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2658", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002139060", 
              "https://doi.org/10.1038/nmeth.2658"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13059-016-0947-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000759088", 
              "https://doi.org/10.1186/s13059-016-0947-7"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature09796", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049651011", 
              "https://doi.org/10.1038/nature09796"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-11-06", 
        "datePublishedReg": "2018-11-06", 
        "description": "BackgroundCount data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size.ResultsWe demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.ConclusionsCompositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/s12864-018-5160-5", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.2529382", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2564648", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.7519284", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.5300904", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2612238", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2521853", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.4312567", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.4242377", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.6501720", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2519831", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1023790", 
            "issn": [
              "1471-2164"
            ], 
            "name": "BMC Genomics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "19"
          }
        ], 
        "keywords": [
          "count data", 
          "inference", 
          "bias correction", 
          "bias estimates", 
          "normalization approach", 
          "subsampling", 
          "normalization method", 
          "normalization technique", 
          "improved performance", 
          "approach", 
          "problem", 
          "assumption", 
          "correction", 
          "detailed experiments", 
          "technique", 
          "estimates", 
          "properties", 
          "bias", 
          "library size", 
          "distribution", 
          "data", 
          "machine", 
          "sequencing process", 
          "performance", 
          "dataset", 
          "experiments", 
          "features", 
          "size", 
          "sequencing machines", 
          "analysis", 
          "process", 
          "absolute abundance", 
          "technical bias", 
          "compositional bias", 
          "survey data", 
          "influence", 
          "quantitative molecular assays", 
          "addition", 
          "light", 
          "abundance", 
          "Scale-16", 
          "metagenomics", 
          "molecular assays", 
          "method", 
          "sequencing", 
          "assays", 
          "data normalization approaches", 
          "ResultsWe"
        ], 
        "name": "Analysis and correction of compositional bias in sparse sequencing count data", 
        "pagination": "799", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1108058115"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s12864-018-5160-5"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "30400812"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s12864-018-5160-5", 
          "https://app.dimensions.ai/details/publication/pub.1108058115"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-10-01T06:45", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/article/article_782.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/s12864-018-5160-5"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-5160-5'


     

    This table displays all metadata directly associated to this object as RDF triples.

    319 TRIPLES      21 PREDICATES      109 URIs      72 LITERALS      14 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s12864-018-5160-5 schema:about N49319c0d055d41d8bede288667463618
    2 N4e9eb82e12fd4fd5916ffed99d6e9dd6
    3 N5e33b25f80d446279dbd29ddb096da6f
    4 N79e58bc990ed4d11b7fbe619b1dc1d6b
    5 N8ae998f8011740baa5a0ae29c52253be
    6 N8fd807b0c32c48bbba36ef234710dfac
    7 Ne969301afe2a4ee8973240ad5870b153
    8 anzsrc-for:06
    9 anzsrc-for:08
    10 anzsrc-for:11
    11 schema:author Ne01c6cefe151458d85d55ca30fa91c6c
    12 schema:citation sg:pub.10.1007/s00769-012-0941-z
    13 sg:pub.10.1038/nature09796
    14 sg:pub.10.1038/nature11234
    15 sg:pub.10.1038/nbt.3960
    16 sg:pub.10.1038/nbt1486
    17 sg:pub.10.1038/ncomms4230
    18 sg:pub.10.1038/ngeo234
    19 sg:pub.10.1038/nmeth.2658
    20 sg:pub.10.1038/nmeth.2772
    21 sg:pub.10.1038/nrg1709
    22 sg:pub.10.1038/nrg2484
    23 sg:pub.10.1038/nrg2641
    24 sg:pub.10.1038/nrg3833
    25 sg:pub.10.1038/nrmicro2832
    26 sg:pub.10.1186/1471-2105-11-94
    27 sg:pub.10.1186/1745-6150-4-14
    28 sg:pub.10.1186/1756-0500-5-668
    29 sg:pub.10.1186/2049-2618-2-15
    30 sg:pub.10.1186/gb-2010-11-10-r106
    31 sg:pub.10.1186/gb-2010-11-2-r14
    32 sg:pub.10.1186/gb-2010-11-3-r25
    33 sg:pub.10.1186/gb-2014-15-2-r29
    34 sg:pub.10.1186/gb-2014-15-6-r76
    35 sg:pub.10.1186/s12864-015-2194-9
    36 sg:pub.10.1186/s12866-015-0351-6
    37 sg:pub.10.1186/s12915-014-0087-z
    38 sg:pub.10.1186/s13059-014-0550-8
    39 sg:pub.10.1186/s13059-016-0947-7
    40 schema:datePublished 2018-11-06
    41 schema:datePublishedReg 2018-11-06
    42 schema:description BackgroundCount data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size.ResultsWe demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.ConclusionsCompositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.
    43 schema:genre article
    44 schema:isAccessibleForFree true
    45 schema:isPartOf N3d79c5b13b944406a8d60f12f18327a6
    46 N5963da59e581419d8c560e3969ddfa09
    47 sg:journal.1023790
    48 schema:keywords ResultsWe
    49 Scale-16
    50 absolute abundance
    51 abundance
    52 addition
    53 analysis
    54 approach
    55 assays
    56 assumption
    57 bias
    58 bias correction
    59 bias estimates
    60 compositional bias
    61 correction
    62 count data
    63 data
    64 data normalization approaches
    65 dataset
    66 detailed experiments
    67 distribution
    68 estimates
    69 experiments
    70 features
    71 improved performance
    72 inference
    73 influence
    74 library size
    75 light
    76 machine
    77 metagenomics
    78 method
    79 molecular assays
    80 normalization approach
    81 normalization method
    82 normalization technique
    83 performance
    84 problem
    85 process
    86 properties
    87 quantitative molecular assays
    88 sequencing
    89 sequencing machines
    90 sequencing process
    91 size
    92 subsampling
    93 survey data
    94 technical bias
    95 technique
    96 schema:name Analysis and correction of compositional bias in sparse sequencing count data
    97 schema:pagination 799
    98 schema:productId N2fb9d93735d84f75a3fa5125196a03a5
    99 N5c4592823f584fa2ae0e86ba0a091e86
    100 Nfc9c6fca61ed4870b9885852da7d4d41
    101 schema:sameAs https://app.dimensions.ai/details/publication/pub.1108058115
    102 https://doi.org/10.1186/s12864-018-5160-5
    103 schema:sdDatePublished 2022-10-01T06:45
    104 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    105 schema:sdPublisher N3eb197fa846044e4a55ef07dad38ba93
    106 schema:url https://doi.org/10.1186/s12864-018-5160-5
    107 sgo:license sg:explorer/license/
    108 sgo:sdDataset articles
    109 rdf:type schema:ScholarlyArticle
    110 N13a358abc1654a318a1811f1400fd064 rdf:first sg:person.01136367606.02
    111 rdf:rest N9969cdaf4aeb400fb260f89986f98beb
    112 N2f47ad214f0d43d293db98b542be2e4d rdf:first sg:person.0744073663.77
    113 rdf:rest N13a358abc1654a318a1811f1400fd064
    114 N2fb9d93735d84f75a3fa5125196a03a5 schema:name dimensions_id
    115 schema:value pub.1108058115
    116 rdf:type schema:PropertyValue
    117 N3d79c5b13b944406a8d60f12f18327a6 schema:volumeNumber 19
    118 rdf:type schema:PublicationVolume
    119 N3eb197fa846044e4a55ef07dad38ba93 schema:name Springer Nature - SN SciGraph project
    120 rdf:type schema:Organization
    121 N49319c0d055d41d8bede288667463618 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    122 schema:name Algorithms
    123 rdf:type schema:DefinedTerm
    124 N4e9eb82e12fd4fd5916ffed99d6e9dd6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    125 schema:name Metagenomics
    126 rdf:type schema:DefinedTerm
    127 N5963da59e581419d8c560e3969ddfa09 schema:issueNumber 1
    128 rdf:type schema:PublicationIssue
    129 N5c4592823f584fa2ae0e86ba0a091e86 schema:name pubmed_id
    130 schema:value 30400812
    131 rdf:type schema:PropertyValue
    132 N5e33b25f80d446279dbd29ddb096da6f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    133 schema:name Computational Biology
    134 rdf:type schema:DefinedTerm
    135 N79e58bc990ed4d11b7fbe619b1dc1d6b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    136 schema:name Bayes Theorem
    137 rdf:type schema:DefinedTerm
    138 N88140fe41a0749f5a78ce5b84fc7280b rdf:first sg:person.012060167025.63
    139 rdf:rest N2f47ad214f0d43d293db98b542be2e4d
    140 N8ae998f8011740baa5a0ae29c52253be schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    141 schema:name Microbiota
    142 rdf:type schema:DefinedTerm
    143 N8fd807b0c32c48bbba36ef234710dfac schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    144 schema:name RNA, Ribosomal, 16S
    145 rdf:type schema:DefinedTerm
    146 N9969cdaf4aeb400fb260f89986f98beb rdf:first sg:person.01341565477.18
    147 rdf:rest Nce0e8c60b2b14733ac7c6ca2f0670213
    148 Nce0e8c60b2b14733ac7c6ca2f0670213 rdf:first sg:person.0706015450.76
    149 rdf:rest rdf:nil
    150 Ne01c6cefe151458d85d55ca30fa91c6c rdf:first sg:person.0634642307.58
    151 rdf:rest N88140fe41a0749f5a78ce5b84fc7280b
    152 Ne969301afe2a4ee8973240ad5870b153 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    153 schema:name High-Throughput Nucleotide Sequencing
    154 rdf:type schema:DefinedTerm
    155 Nfc9c6fca61ed4870b9885852da7d4d41 schema:name doi
    156 schema:value 10.1186/s12864-018-5160-5
    157 rdf:type schema:PropertyValue
    158 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    159 schema:name Biological Sciences
    160 rdf:type schema:DefinedTerm
    161 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    162 schema:name Information and Computing Sciences
    163 rdf:type schema:DefinedTerm
    164 anzsrc-for:11 schema:inDefinedTermSet anzsrc-for:
    165 schema:name Medical and Health Sciences
    166 rdf:type schema:DefinedTerm
    167 sg:grant.2519831 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    168 rdf:type schema:MonetaryGrant
    169 sg:grant.2521853 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    170 rdf:type schema:MonetaryGrant
    171 sg:grant.2529382 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    172 rdf:type schema:MonetaryGrant
    173 sg:grant.2564648 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    174 rdf:type schema:MonetaryGrant
    175 sg:grant.2612238 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    176 rdf:type schema:MonetaryGrant
    177 sg:grant.4242377 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    178 rdf:type schema:MonetaryGrant
    179 sg:grant.4312567 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    180 rdf:type schema:MonetaryGrant
    181 sg:grant.5300904 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    182 rdf:type schema:MonetaryGrant
    183 sg:grant.6501720 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    184 rdf:type schema:MonetaryGrant
    185 sg:grant.7519284 http://pending.schema.org/fundedItem sg:pub.10.1186/s12864-018-5160-5
    186 rdf:type schema:MonetaryGrant
    187 sg:journal.1023790 schema:issn 1471-2164
    188 schema:name BMC Genomics
    189 schema:publisher Springer Nature
    190 rdf:type schema:Periodical
    191 sg:person.01136367606.02 schema:affiliation grid-institutes:grid.38142.3c
    192 schema:familyName Hicks
    193 schema:givenName Stephanie C.
    194 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01136367606.02
    195 rdf:type schema:Person
    196 sg:person.012060167025.63 schema:affiliation grid-institutes:grid.432923.d
    197 schema:familyName Slud
    198 schema:givenName Eric V.
    199 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012060167025.63
    200 rdf:type schema:Person
    201 sg:person.01341565477.18 schema:affiliation grid-institutes:grid.164295.d
    202 schema:familyName Hannenhalli
    203 schema:givenName Sridhar
    204 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01341565477.18
    205 rdf:type schema:Person
    206 sg:person.0634642307.58 schema:affiliation grid-institutes:grid.164295.d
    207 schema:familyName Kumar
    208 schema:givenName M. Senthil
    209 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0634642307.58
    210 rdf:type schema:Person
    211 sg:person.0706015450.76 schema:affiliation grid-institutes:grid.164295.d
    212 schema:familyName Corrada Bravo
    213 schema:givenName Héctor
    214 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0706015450.76
    215 rdf:type schema:Person
    216 sg:person.0744073663.77 schema:affiliation grid-institutes:grid.418158.1
    217 schema:familyName Okrah
    218 schema:givenName Kwame
    219 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0744073663.77
    220 rdf:type schema:Person
    221 sg:pub.10.1007/s00769-012-0941-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1012638035
    222 https://doi.org/10.1007/s00769-012-0941-z
    223 rdf:type schema:CreativeWork
    224 sg:pub.10.1038/nature09796 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049651011
    225 https://doi.org/10.1038/nature09796
    226 rdf:type schema:CreativeWork
    227 sg:pub.10.1038/nature11234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007740093
    228 https://doi.org/10.1038/nature11234
    229 rdf:type schema:CreativeWork
    230 sg:pub.10.1038/nbt.3960 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092055967
    231 https://doi.org/10.1038/nbt.3960
    232 rdf:type schema:CreativeWork
    233 sg:pub.10.1038/nbt1486 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005954516
    234 https://doi.org/10.1038/nbt1486
    235 rdf:type schema:CreativeWork
    236 sg:pub.10.1038/ncomms4230 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044984586
    237 https://doi.org/10.1038/ncomms4230
    238 rdf:type schema:CreativeWork
    239 sg:pub.10.1038/ngeo234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022547638
    240 https://doi.org/10.1038/ngeo234
    241 rdf:type schema:CreativeWork
    242 sg:pub.10.1038/nmeth.2658 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002139060
    243 https://doi.org/10.1038/nmeth.2658
    244 rdf:type schema:CreativeWork
    245 sg:pub.10.1038/nmeth.2772 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029706604
    246 https://doi.org/10.1038/nmeth.2772
    247 rdf:type schema:CreativeWork
    248 sg:pub.10.1038/nrg1709 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017719492
    249 https://doi.org/10.1038/nrg1709
    250 rdf:type schema:CreativeWork
    251 sg:pub.10.1038/nrg2484 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030687647
    252 https://doi.org/10.1038/nrg2484
    253 rdf:type schema:CreativeWork
    254 sg:pub.10.1038/nrg2641 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006115199
    255 https://doi.org/10.1038/nrg2641
    256 rdf:type schema:CreativeWork
    257 sg:pub.10.1038/nrg3833 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004107723
    258 https://doi.org/10.1038/nrg3833
    259 rdf:type schema:CreativeWork
    260 sg:pub.10.1038/nrmicro2832 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030236624
    261 https://doi.org/10.1038/nrmicro2832
    262 rdf:type schema:CreativeWork
    263 sg:pub.10.1186/1471-2105-11-94 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053091615
    264 https://doi.org/10.1186/1471-2105-11-94
    265 rdf:type schema:CreativeWork
    266 sg:pub.10.1186/1745-6150-4-14 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045373440
    267 https://doi.org/10.1186/1745-6150-4-14
    268 rdf:type schema:CreativeWork
    269 sg:pub.10.1186/1756-0500-5-668 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032127979
    270 https://doi.org/10.1186/1756-0500-5-668
    271 rdf:type schema:CreativeWork
    272 sg:pub.10.1186/2049-2618-2-15 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046874717
    273 https://doi.org/10.1186/2049-2618-2-15
    274 rdf:type schema:CreativeWork
    275 sg:pub.10.1186/gb-2010-11-10-r106 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031289083
    276 https://doi.org/10.1186/gb-2010-11-10-r106
    277 rdf:type schema:CreativeWork
    278 sg:pub.10.1186/gb-2010-11-2-r14 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050171830
    279 https://doi.org/10.1186/gb-2010-11-2-r14
    280 rdf:type schema:CreativeWork
    281 sg:pub.10.1186/gb-2010-11-3-r25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050509557
    282 https://doi.org/10.1186/gb-2010-11-3-r25
    283 rdf:type schema:CreativeWork
    284 sg:pub.10.1186/gb-2014-15-2-r29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045312009
    285 https://doi.org/10.1186/gb-2014-15-2-r29
    286 rdf:type schema:CreativeWork
    287 sg:pub.10.1186/gb-2014-15-6-r76 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024439294
    288 https://doi.org/10.1186/gb-2014-15-6-r76
    289 rdf:type schema:CreativeWork
    290 sg:pub.10.1186/s12864-015-2194-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027095113
    291 https://doi.org/10.1186/s12864-015-2194-9
    292 rdf:type schema:CreativeWork
    293 sg:pub.10.1186/s12866-015-0351-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001744296
    294 https://doi.org/10.1186/s12866-015-0351-6
    295 rdf:type schema:CreativeWork
    296 sg:pub.10.1186/s12915-014-0087-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1027737035
    297 https://doi.org/10.1186/s12915-014-0087-z
    298 rdf:type schema:CreativeWork
    299 sg:pub.10.1186/s13059-014-0550-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015222646
    300 https://doi.org/10.1186/s13059-014-0550-8
    301 rdf:type schema:CreativeWork
    302 sg:pub.10.1186/s13059-016-0947-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000759088
    303 https://doi.org/10.1186/s13059-016-0947-7
    304 rdf:type schema:CreativeWork
    305 grid-institutes:grid.164295.d schema:alternateName Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
    306 schema:name Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
    307 Graduate Program in Bioinformatics, University of Maryland, College Park, MD, USA
    308 rdf:type schema:Organization
    309 grid-institutes:grid.38142.3c schema:alternateName Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
    310 schema:name Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard University, Boston, MA, USA
    311 Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
    312 rdf:type schema:Organization
    313 grid-institutes:grid.418158.1 schema:alternateName GRED Oncology Biostatistics, Genentech, San Francisco, CA, USA
    314 schema:name GRED Oncology Biostatistics, Genentech, San Francisco, CA, USA
    315 rdf:type schema:Organization
    316 grid-institutes:grid.432923.d schema:alternateName Center for Statistical Research and Methodology, U.S Census Bureau, Suitland, MD, USA
    317 schema:name Center for Statistical Research and Methodology, U.S Census Bureau, Suitland, MD, USA
    318 Department of Mathematics, University of Maryland, College Park, MD, USA
    319 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...