Ontology type: schema:ScholarlyArticle Open Access: True
2018-04-20
AUTHORSMariana Buongermino Pereira, Mikael Wallroth, Viktor Jonsson, Erik Kristiansson
ABSTRACTBackgroundIn shotgun metagenomics, microbial communities are studied through direct sequencing of DNA without any prior cultivation. By comparing gene abundances estimated from the generated sequencing reads, functional differences between the communities can be identified. However, gene abundance data is affected by high levels of systematic variability, which can greatly reduce the statistical power and introduce false positives. Normalization, which is the process where systematic variability is identified and removed, is therefore a vital part of the data analysis. A wide range of normalization methods for high-dimensional count data has been proposed but their performance on the analysis of shotgun metagenomic data has not been evaluated.ResultsHere, we present a systematic evaluation of nine normalization methods for gene abundance data. The methods were evaluated through resampling of three comprehensive datasets, creating a realistic setting that preserved the unique characteristics of metagenomic data. Performance was measured in terms of the methods ability to identify differentially abundant genes (DAGs), correctly calculate unbiased p-values and control the false discovery rate (FDR). Our results showed that the choice of normalization method has a large impact on the end results. When the DAGs were asymmetrically present between the experimental conditions, many normalization methods had a reduced true positive rate (TPR) and a high false positive rate (FPR). The methods trimmed mean of M-values (TMM) and relative log expression (RLE) had the overall highest performance and are therefore recommended for the analysis of gene abundance data. For larger sample sizes, CSS also showed satisfactory performance.ConclusionsThis study emphasizes the importance of selecting a suitable normalization methods in the analysis of data from shotgun metagenomics. Our results also demonstrate that improper methods may result in unacceptably high levels of false positives, which in turn may lead to incorrect or obfuscated biological interpretation. More... »
PAGES274
http://scigraph.springernature.com/pub.10.1186/s12864-018-4637-6
DOIhttp://dx.doi.org/10.1186/s12864-018-4637-6
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1103494409
PUBMEDhttps://www.ncbi.nlm.nih.gov/pubmed/29678163
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Biological Sciences",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Genetics",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Data Analysis",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Metagenomics",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-412 96, Gothenburg, Sweden",
"id": "http://www.grid.ac/institutes/grid.8761.8",
"name": [
"Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-412 96, Gothenburg, Sweden"
],
"type": "Organization"
},
"familyName": "Pereira",
"givenName": "Mariana Buongermino",
"id": "sg:person.0656367524.09",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0656367524.09"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-412 96, Gothenburg, Sweden",
"id": "http://www.grid.ac/institutes/grid.8761.8",
"name": [
"Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-412 96, Gothenburg, Sweden"
],
"type": "Organization"
},
"familyName": "Wallroth",
"givenName": "Mikael",
"id": "sg:person.011360455425.69",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011360455425.69"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-412 96, Gothenburg, Sweden",
"id": "http://www.grid.ac/institutes/grid.8761.8",
"name": [
"Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-412 96, Gothenburg, Sweden"
],
"type": "Organization"
},
"familyName": "Jonsson",
"givenName": "Viktor",
"id": "sg:person.0761362305.11",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0761362305.11"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-412 96, Gothenburg, Sweden",
"id": "http://www.grid.ac/institutes/grid.8761.8",
"name": [
"Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-412 96, Gothenburg, Sweden"
],
"type": "Organization"
},
"familyName": "Kristiansson",
"givenName": "Erik",
"id": "sg:person.01051113471.17",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01051113471.17"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1038/nmeth.2658",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1002139060",
"https://doi.org/10.1038/nmeth.2658"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s13059-015-0610-8",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1029050385",
"https://doi.org/10.1186/s13059-015-0610-8"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s13742-015-0078-1",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1027641814",
"https://doi.org/10.1186/s13742-015-0078-1"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/nature11450",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1004546178",
"https://doi.org/10.1038/nature11450"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/srep46130",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1084748698",
"https://doi.org/10.1038/srep46130"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s12864-016-2386-y",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1020096515",
"https://doi.org/10.1186/s12864-016-2386-y"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/nature11053",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1052378845",
"https://doi.org/10.1038/nature11053"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/gb-2010-11-10-r106",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1031289083",
"https://doi.org/10.1186/gb-2010-11-10-r106"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/1471-2105-9-386",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1006083026",
"https://doi.org/10.1186/1471-2105-9-386"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/nature12198",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1002791386",
"https://doi.org/10.1038/nature12198"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/nmeth.2693",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1028082738",
"https://doi.org/10.1038/nmeth.2693"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s13059-014-0550-8",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1015222646",
"https://doi.org/10.1186/s13059-014-0550-8"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s40168-017-0237-y",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1084252802",
"https://doi.org/10.1186/s40168-017-0237-y"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/1471-2105-11-94",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1053091615",
"https://doi.org/10.1186/1471-2105-11-94"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/nbt.2957",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1027683701",
"https://doi.org/10.1038/nbt.2957"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s12864-017-3686-6",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1084954596",
"https://doi.org/10.1186/s12864-017-3686-6"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s11274-017-2255-0",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1084519412",
"https://doi.org/10.1007/s11274-017-2255-0"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/gb-2010-11-3-r25",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1050509557",
"https://doi.org/10.1186/gb-2010-11-3-r25"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/gb-2005-6-8-229",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1029667488",
"https://doi.org/10.1186/gb-2005-6-8-229"
],
"type": "CreativeWork"
}
],
"datePublished": "2018-04-20",
"datePublishedReg": "2018-04-20",
"description": "BackgroundIn shotgun metagenomics, microbial communities are studied through direct sequencing of DNA without any prior cultivation. By comparing gene abundances estimated from the generated sequencing reads, functional differences between the communities can be identified. However, gene abundance data is affected by high levels of systematic variability, which can greatly reduce the statistical power and introduce false positives. Normalization, which is the process where systematic variability is identified and removed, is therefore a vital part of the data analysis. A wide range of normalization methods for high-dimensional count data has been proposed but their performance on the analysis of shotgun metagenomic data has not been evaluated.ResultsHere, we present a systematic evaluation of nine normalization methods for gene abundance data. The methods were evaluated through resampling of three comprehensive datasets, creating a realistic setting that preserved the unique characteristics of metagenomic data. Performance was measured in terms of the methods ability to identify differentially abundant genes (DAGs), correctly calculate unbiased p-values and control the false discovery rate (FDR). Our results showed that the choice of normalization method has a large impact on the end results. When the DAGs were asymmetrically present between the experimental conditions, many normalization methods had a reduced true positive rate (TPR) and a high false positive rate (FPR). The methods trimmed mean of M-values (TMM) and relative log expression (RLE) had the overall highest performance and are therefore recommended for the analysis of gene abundance data. For larger sample sizes, CSS also showed satisfactory performance.ConclusionsThis study emphasizes the importance of selecting a suitable normalization methods in the analysis of data from shotgun metagenomics. Our results also demonstrate that improper methods may result in unacceptably high levels of false positives, which in turn may lead to incorrect or obfuscated biological interpretation.",
"genre": "article",
"id": "sg:pub.10.1186/s12864-018-4637-6",
"isAccessibleForFree": true,
"isPartOf": [
{
"id": "sg:journal.1023790",
"issn": [
"1471-2164"
],
"name": "BMC Genomics",
"publisher": "Springer Nature",
"type": "Periodical"
},
{
"issueNumber": "1",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "19"
}
],
"keywords": [
"gene abundance data",
"abundance data",
"shotgun metagenomics",
"metagenomic data",
"false discovery rate",
"shotgun metagenomic data",
"microbial communities",
"abundant genes",
"gene abundance",
"sequencing reads",
"unbiased p-values",
"biological interpretation",
"prior cultivation",
"functional differences",
"metagenomics",
"direct sequencing",
"discovery rate",
"high levels",
"high-dimensional count data",
"comprehensive dataset",
"genes",
"sequencing",
"DNA",
"abundance",
"reads",
"unacceptably high levels",
"expression",
"statistical power",
"community",
"ResultsHere",
"cultivation",
"false positives",
"count data",
"variability",
"wide range",
"normalization method",
"analysis",
"levels",
"unique characteristics",
"large impact",
"larger sample size",
"high false positive rate",
"DAG",
"experimental conditions",
"suitable normalization method",
"method's ability",
"ability",
"satisfactory performance",
"data",
"realistic settings",
"importance",
"results",
"false positive rate",
"rate",
"turn",
"sample size",
"positives",
"size",
"vital part",
"data analysis",
"process",
"systematic variability",
"systematic evaluation",
"resampling",
"conditions",
"ConclusionsThis study",
"improper methods",
"part",
"dataset",
"differences",
"p-value",
"study",
"high performance",
"analysis of data",
"performance",
"range",
"true positive rate",
"impact",
"comparison",
"overall high performance",
"positive rate",
"M values",
"terms",
"end result",
"method",
"power",
"characteristics",
"means",
"interpretation",
"choice",
"normalization",
"setting",
"CSS",
"evaluation"
],
"name": "Comparison of normalization methods for the analysis of metagenomic gene abundance data",
"pagination": "274",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1103494409"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1186/s12864-018-4637-6"
]
},
{
"name": "pubmed_id",
"type": "PropertyValue",
"value": [
"29678163"
]
}
],
"sameAs": [
"https://doi.org/10.1186/s12864-018-4637-6",
"https://app.dimensions.ai/details/publication/pub.1103494409"
],
"sdDataset": "articles",
"sdDatePublished": "2022-08-04T17:07",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220804/entities/gbq_results/article/article_772.jsonl",
"type": "ScholarlyArticle",
"url": "https://doi.org/10.1186/s12864-018-4637-6"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-4637-6'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-4637-6'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-4637-6'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-4637-6'
This table displays all metadata directly associated to this object as RDF triples.
259 TRIPLES
21 PREDICATES
140 URIs
113 LITERALS
9 BLANK NODES