Host Phenotype Prediction from Differentially Abundant Microbes Using RoDEO View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2017-10-17

AUTHORS

Anna Paola Carrieri , Niina Haiminen , Laxmi Parida

ABSTRACT

Metagenomics is the study of metagenomes which are mixtures of genetic material from several organisms. Metagenomic sequencing is increasingly used in human and animal health, food safety, and environmental studies. In these high-dimensional (metagenomic) data, the phenotype of the host organism, e.g., human, may not be obvious to detect and then the ability to predict it becomes a powerful analytic tool. For example, consider predicting the disease status of an individual from their gut microbiome.In this study, we compare various normalization methods for metagenomic count data and their impact on phenotype prediction. The methods include RoDEO, Robust Differential Expression Operator, originally developed for gene expression studies. The best prediction accuracy is observed for RoDEO-processed count data with linear kernel support vector machines in most cases, for a variety of real datasets including human, mouse, and environmental samples.We also address the problem of identifying the most relevant microbial features that could give insight into the structure and function of the differential communities observed between phenotypes. Interestingly, we obtain similar or better phenotype prediction accuracy with a small subset of features as with the complete set of sequenced features. More... »

PAGES

27-41

Book

TITLE

Computational Intelligence Methods for Bioinformatics and Biostatistics

ISBN

978-3-319-67833-7
978-3-319-67834-4

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-67834-4_3

DOI

http://dx.doi.org/10.1007/978-3-319-67834-4_3

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1092234703


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "IBM Research UK, WA4 4AD, Warrington, UK", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "IBM Research UK, WA4 4AD, Warrington, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Carrieri", 
        "givenName": "Anna Paola", 
        "id": "sg:person.014252047761.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014252047761.02"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM T.J. Watson Research Center, 10598, Yorktown Heights, NY, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM T.J. Watson Research Center, 10598, Yorktown Heights, NY, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Haiminen", 
        "givenName": "Niina", 
        "id": "sg:person.0746114007.76", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0746114007.76"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM T.J. Watson Research Center, 10598, Yorktown Heights, NY, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM T.J. Watson Research Center, 10598, Yorktown Heights, NY, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Parida", 
        "givenName": "Laxmi", 
        "id": "sg:person.01336557015.68", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2017-10-17", 
    "datePublishedReg": "2017-10-17", 
    "description": "Metagenomics is the study of metagenomes which are mixtures of genetic material from several organisms. Metagenomic sequencing is increasingly used in human and animal health, food safety, and environmental studies. In these high-dimensional (metagenomic) data, the phenotype of the host organism, e.g., human, may not be obvious to detect and then the ability to predict it becomes a powerful analytic tool. For example, consider predicting the disease status of an individual from their gut microbiome.In this study, we compare various normalization methods for metagenomic count data and their impact on phenotype prediction. The methods include RoDEO, Robust Differential Expression Operator, originally developed for gene expression studies. The best prediction accuracy is observed for RoDEO-processed count data with linear kernel support vector machines in most cases, for a variety of real datasets including human, mouse, and environmental samples.We also address the problem of identifying the most relevant microbial features that could give insight into the structure and function of the differential communities observed between phenotypes. Interestingly, we obtain similar or better phenotype prediction accuracy with a small subset of features as with the complete set of sequenced features.", 
    "editor": [
      {
        "familyName": "Bracciali", 
        "givenName": "Andrea", 
        "type": "Person"
      }, 
      {
        "familyName": "Caravagna", 
        "givenName": "Giulio", 
        "type": "Person"
      }, 
      {
        "familyName": "Gilbert", 
        "givenName": "David", 
        "type": "Person"
      }, 
      {
        "familyName": "Tagliaferri", 
        "givenName": "Roberto", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-67834-4_3", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-319-67833-7", 
        "978-3-319-67834-4"
      ], 
      "name": "Computational Intelligence Methods for Bioinformatics and Biostatistics", 
      "type": "Book"
    }, 
    "keywords": [
      "phenotype prediction", 
      "study of metagenomes", 
      "host phenotype prediction", 
      "phenotype prediction accuracy", 
      "gene expression studies", 
      "differential communities", 
      "abundant microbes", 
      "host organism", 
      "expression studies", 
      "metagenomic count data", 
      "genetic material", 
      "metagenomic sequencing", 
      "microbial features", 
      "organisms", 
      "animal health", 
      "gut microbiome", 
      "phenotype", 
      "count data", 
      "environmental samples", 
      "metagenomes", 
      "metagenomics", 
      "high-dimensional data", 
      "microbes", 
      "sequencing", 
      "powerful analytic tool", 
      "microbiome", 
      "small subset", 
      "humans", 
      "prediction accuracy", 
      "better prediction accuracy", 
      "real datasets", 
      "complete set", 
      "food safety", 
      "environmental studies", 
      "insights", 
      "mice", 
      "community", 
      "analytic tools", 
      "function", 
      "rodeo", 
      "variety", 
      "operators", 
      "study", 
      "ability", 
      "accuracy", 
      "subset", 
      "prediction", 
      "disease status", 
      "normalization method", 
      "data", 
      "features", 
      "most cases", 
      "individuals", 
      "structure", 
      "problem", 
      "set", 
      "tool", 
      "dataset", 
      "status", 
      "support vector machine", 
      "kernel support vector machine", 
      "vector machine", 
      "impact", 
      "machine", 
      "health", 
      "samples", 
      "example", 
      "cases", 
      "mixture", 
      "method", 
      "materials", 
      "linear kernel support vector machine", 
      "safety"
    ], 
    "name": "Host Phenotype Prediction from Differentially Abundant Microbes Using RoDEO", 
    "pagination": "27-41", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1092234703"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-67834-4_3"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-67834-4_3", 
      "https://app.dimensions.ai/details/publication/pub.1092234703"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-09-02T16:15", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220902/entities/gbq_results/chapter/chapter_373.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-319-67834-4_3"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-67834-4_3'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-67834-4_3'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-67834-4_3'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-67834-4_3'


 

This table displays all metadata directly associated to this object as RDF triples.

164 TRIPLES      22 PREDICATES      97 URIs      90 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-67834-4_3 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author Na0e67730e6374925b3c97b699b4e2ba2
4 schema:datePublished 2017-10-17
5 schema:datePublishedReg 2017-10-17
6 schema:description Metagenomics is the study of metagenomes which are mixtures of genetic material from several organisms. Metagenomic sequencing is increasingly used in human and animal health, food safety, and environmental studies. In these high-dimensional (metagenomic) data, the phenotype of the host organism, e.g., human, may not be obvious to detect and then the ability to predict it becomes a powerful analytic tool. For example, consider predicting the disease status of an individual from their gut microbiome.In this study, we compare various normalization methods for metagenomic count data and their impact on phenotype prediction. The methods include RoDEO, Robust Differential Expression Operator, originally developed for gene expression studies. The best prediction accuracy is observed for RoDEO-processed count data with linear kernel support vector machines in most cases, for a variety of real datasets including human, mouse, and environmental samples.We also address the problem of identifying the most relevant microbial features that could give insight into the structure and function of the differential communities observed between phenotypes. Interestingly, we obtain similar or better phenotype prediction accuracy with a small subset of features as with the complete set of sequenced features.
7 schema:editor N79014a3ccbcd4ca2930ff43c45628e9b
8 schema:genre chapter
9 schema:isAccessibleForFree false
10 schema:isPartOf N0004f720cb714c2fb6fc3c12b2ac0fee
11 schema:keywords ability
12 abundant microbes
13 accuracy
14 analytic tools
15 animal health
16 better prediction accuracy
17 cases
18 community
19 complete set
20 count data
21 data
22 dataset
23 differential communities
24 disease status
25 environmental samples
26 environmental studies
27 example
28 expression studies
29 features
30 food safety
31 function
32 gene expression studies
33 genetic material
34 gut microbiome
35 health
36 high-dimensional data
37 host organism
38 host phenotype prediction
39 humans
40 impact
41 individuals
42 insights
43 kernel support vector machine
44 linear kernel support vector machine
45 machine
46 materials
47 metagenomes
48 metagenomic count data
49 metagenomic sequencing
50 metagenomics
51 method
52 mice
53 microbes
54 microbial features
55 microbiome
56 mixture
57 most cases
58 normalization method
59 operators
60 organisms
61 phenotype
62 phenotype prediction
63 phenotype prediction accuracy
64 powerful analytic tool
65 prediction
66 prediction accuracy
67 problem
68 real datasets
69 rodeo
70 safety
71 samples
72 sequencing
73 set
74 small subset
75 status
76 structure
77 study
78 study of metagenomes
79 subset
80 support vector machine
81 tool
82 variety
83 vector machine
84 schema:name Host Phenotype Prediction from Differentially Abundant Microbes Using RoDEO
85 schema:pagination 27-41
86 schema:productId N0b9f1141f4084313880bc415bf924ff7
87 N4740ce8887dd4ff6ab93b23b8c9f0a93
88 schema:publisher Na355acf213194d2fb78bfb7fbc29d1e2
89 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092234703
90 https://doi.org/10.1007/978-3-319-67834-4_3
91 schema:sdDatePublished 2022-09-02T16:15
92 schema:sdLicense https://scigraph.springernature.com/explorer/license/
93 schema:sdPublisher Nedc3942e37da450abcf74604eeb372e7
94 schema:url https://doi.org/10.1007/978-3-319-67834-4_3
95 sgo:license sg:explorer/license/
96 sgo:sdDataset chapters
97 rdf:type schema:Chapter
98 N0004f720cb714c2fb6fc3c12b2ac0fee schema:isbn 978-3-319-67833-7
99 978-3-319-67834-4
100 schema:name Computational Intelligence Methods for Bioinformatics and Biostatistics
101 rdf:type schema:Book
102 N0b9f1141f4084313880bc415bf924ff7 schema:name dimensions_id
103 schema:value pub.1092234703
104 rdf:type schema:PropertyValue
105 N2be6f3c064ba44439545e12966b6745e schema:familyName Bracciali
106 schema:givenName Andrea
107 rdf:type schema:Person
108 N43d3bcd983fb47fe92f14b9ee0fdbf3b rdf:first Nb7b32c7674db445daaf4c76ebe5864f2
109 rdf:rest Na8fc3cf9e2374293984bab1be7ffe73d
110 N4740ce8887dd4ff6ab93b23b8c9f0a93 schema:name doi
111 schema:value 10.1007/978-3-319-67834-4_3
112 rdf:type schema:PropertyValue
113 N54fa10d6d2ce42f7bcbd2a55ed417d37 schema:familyName Tagliaferri
114 schema:givenName Roberto
115 rdf:type schema:Person
116 N79014a3ccbcd4ca2930ff43c45628e9b rdf:first N2be6f3c064ba44439545e12966b6745e
117 rdf:rest N43d3bcd983fb47fe92f14b9ee0fdbf3b
118 N8d8fd22958b94af6b2474ab7962e6f67 rdf:first sg:person.01336557015.68
119 rdf:rest rdf:nil
120 Na0e67730e6374925b3c97b699b4e2ba2 rdf:first sg:person.014252047761.02
121 rdf:rest Nb9cd0ecbe1974130a948f75faae94cdb
122 Na355acf213194d2fb78bfb7fbc29d1e2 schema:name Springer Nature
123 rdf:type schema:Organisation
124 Na8fc3cf9e2374293984bab1be7ffe73d rdf:first Nd7d0b46b13ac466aae478b1d3d40fbd5
125 rdf:rest Nefee8daa20d14e769de2f480281e32c3
126 Nb7b32c7674db445daaf4c76ebe5864f2 schema:familyName Caravagna
127 schema:givenName Giulio
128 rdf:type schema:Person
129 Nb9cd0ecbe1974130a948f75faae94cdb rdf:first sg:person.0746114007.76
130 rdf:rest N8d8fd22958b94af6b2474ab7962e6f67
131 Nd7d0b46b13ac466aae478b1d3d40fbd5 schema:familyName Gilbert
132 schema:givenName David
133 rdf:type schema:Person
134 Nedc3942e37da450abcf74604eeb372e7 schema:name Springer Nature - SN SciGraph project
135 rdf:type schema:Organization
136 Nefee8daa20d14e769de2f480281e32c3 rdf:first N54fa10d6d2ce42f7bcbd2a55ed417d37
137 rdf:rest rdf:nil
138 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
139 schema:name Biological Sciences
140 rdf:type schema:DefinedTerm
141 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
142 schema:name Genetics
143 rdf:type schema:DefinedTerm
144 sg:person.01336557015.68 schema:affiliation grid-institutes:grid.481554.9
145 schema:familyName Parida
146 schema:givenName Laxmi
147 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68
148 rdf:type schema:Person
149 sg:person.014252047761.02 schema:affiliation grid-institutes:None
150 schema:familyName Carrieri
151 schema:givenName Anna Paola
152 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014252047761.02
153 rdf:type schema:Person
154 sg:person.0746114007.76 schema:affiliation grid-institutes:grid.481554.9
155 schema:familyName Haiminen
156 schema:givenName Niina
157 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0746114007.76
158 rdf:type schema:Person
159 grid-institutes:None schema:alternateName IBM Research UK, WA4 4AD, Warrington, UK
160 schema:name IBM Research UK, WA4 4AD, Warrington, UK
161 rdf:type schema:Organization
162 grid-institutes:grid.481554.9 schema:alternateName IBM T.J. Watson Research Center, 10598, Yorktown Heights, NY, USA
163 schema:name IBM T.J. Watson Research Center, 10598, Yorktown Heights, NY, USA
164 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...