A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2020-04-03

AUTHORS

Sayan Mandal , Aldo Guzmán-Sáenz , Niina Haiminen , Saugata Basu , Laxmi Parida

ABSTRACT

The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis.We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods.This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes. More... »

PAGES

178-187

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-030-42266-0_14

DOI

http://dx.doi.org/10.1007/978-3-030-42266-0_14

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1125828979


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "The Ohio State University, Columbus, OH, USA", 
          "id": "http://www.grid.ac/institutes/grid.261331.4", 
          "name": [
            "The Ohio State University, Columbus, OH, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Mandal", 
        "givenName": "Sayan", 
        "id": "sg:person.010555173053.78", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010555173053.78"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Research, T. J. Watson Research Center, Yorktown Heights, NY, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM Research, T. J. Watson Research Center, Yorktown Heights, NY, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Guzm\u00e1n-S\u00e1enz", 
        "givenName": "Aldo", 
        "id": "sg:person.015720005324.51", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015720005324.51"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Research, T. J. Watson Research Center, Yorktown Heights, NY, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM Research, T. J. Watson Research Center, Yorktown Heights, NY, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Haiminen", 
        "givenName": "Niina", 
        "id": "sg:person.0746114007.76", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0746114007.76"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Purdue University, West Lafayette, IN, USA", 
          "id": "http://www.grid.ac/institutes/grid.169077.e", 
          "name": [
            "Purdue University, West Lafayette, IN, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Basu", 
        "givenName": "Saugata", 
        "id": "sg:person.013033776043.37", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013033776043.37"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Research, T. J. Watson Research Center, Yorktown Heights, NY, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM Research, T. J. Watson Research Center, Yorktown Heights, NY, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Parida", 
        "givenName": "Laxmi", 
        "id": "sg:person.01336557015.68", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2020-04-03", 
    "datePublishedReg": "2020-04-03", 
    "description": "The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis.We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson\u2019s disease phenotype prediction when measured against standard machine learning methods.This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.", 
    "editor": [
      {
        "familyName": "Mart\u00edn-Vide", 
        "givenName": "Carlos", 
        "type": "Person"
      }, 
      {
        "familyName": "Vega-Rodr\u00edguez", 
        "givenName": "Miguel A.", 
        "type": "Person"
      }, 
      {
        "familyName": "Wheeler", 
        "givenName": "Travis", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-030-42266-0_14", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-030-42265-3", 
        "978-3-030-42266-0"
      ], 
      "name": "Algorithms for Computational Biology", 
      "type": "Book"
    }, 
    "keywords": [
      "topological data analysis approach", 
      "standard machine learning methods", 
      "data analysis approach", 
      "machine learning methods", 
      "high-dimensional data", 
      "topological data analysis", 
      "gene expression data", 
      "standard machine", 
      "prediction task", 
      "learning methods", 
      "dimensional data", 
      "persistent homology", 
      "phenotype prediction", 
      "analysis approach", 
      "expression data", 
      "machine", 
      "improved results", 
      "topological connections", 
      "data analysis", 
      "Predicting Phenotype", 
      "topological signatures", 
      "task", 
      "subsampling", 
      "phenotype values", 
      "subtle signals", 
      "framework", 
      "data", 
      "prediction", 
      "enough signal", 
      "method", 
      "goal", 
      "signals", 
      "work", 
      "context", 
      "connection", 
      "signatures", 
      "results", 
      "analysis", 
      "indicators", 
      "values", 
      "study", 
      "individuals", 
      "conditions", 
      "useful indicator", 
      "presence", 
      "sequencing", 
      "expression", 
      "absence", 
      "RNA sequencing", 
      "homology", 
      "gene expression", 
      "phenotype", 
      "approach", 
      "genes"
    ], 
    "name": "A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data", 
    "pagination": "178-187", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1125828979"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-030-42266-0_14"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-030-42266-0_14", 
      "https://app.dimensions.ai/details/publication/pub.1125828979"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-09-02T16:10", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220902/entities/gbq_results/chapter/chapter_120.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-030-42266-0_14"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-42266-0_14'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-42266-0_14'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-42266-0_14'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-42266-0_14'


 

This table displays all metadata directly associated to this object as RDF triples.

173 TRIPLES      22 PREDICATES      82 URIs      71 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-030-42266-0_14 schema:about anzsrc-for:01
2 anzsrc-for:0104
3 anzsrc-for:06
4 anzsrc-for:0604
5 anzsrc-for:08
6 anzsrc-for:0801
7 schema:author Nb432f5431b1d46e696991b0a8472693a
8 schema:datePublished 2020-04-03
9 schema:datePublishedReg 2020-04-03
10 schema:description The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis.We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods.This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
11 schema:editor Nbcb947280b164f1e9faf7415c3d22b59
12 schema:genre chapter
13 schema:isAccessibleForFree true
14 schema:isPartOf Nfee4ecb2fbe94c3f9195a718b51575aa
15 schema:keywords Predicting Phenotype
16 RNA sequencing
17 absence
18 analysis
19 analysis approach
20 approach
21 conditions
22 connection
23 context
24 data
25 data analysis
26 data analysis approach
27 dimensional data
28 enough signal
29 expression
30 expression data
31 framework
32 gene expression
33 gene expression data
34 genes
35 goal
36 high-dimensional data
37 homology
38 improved results
39 indicators
40 individuals
41 learning methods
42 machine
43 machine learning methods
44 method
45 persistent homology
46 phenotype
47 phenotype prediction
48 phenotype values
49 prediction
50 prediction task
51 presence
52 results
53 sequencing
54 signals
55 signatures
56 standard machine
57 standard machine learning methods
58 study
59 subsampling
60 subtle signals
61 task
62 topological connections
63 topological data analysis
64 topological data analysis approach
65 topological signatures
66 useful indicator
67 values
68 work
69 schema:name A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
70 schema:pagination 178-187
71 schema:productId Nabd71242919b4f398a3e2008764770ff
72 Nadc3cb9069e04330a74aab1e94d25107
73 schema:publisher N0c41548b07c3480ca13355a3c6c5ac01
74 schema:sameAs https://app.dimensions.ai/details/publication/pub.1125828979
75 https://doi.org/10.1007/978-3-030-42266-0_14
76 schema:sdDatePublished 2022-09-02T16:10
77 schema:sdLicense https://scigraph.springernature.com/explorer/license/
78 schema:sdPublisher N161b88815aea4e4380bf8541d3715396
79 schema:url https://doi.org/10.1007/978-3-030-42266-0_14
80 sgo:license sg:explorer/license/
81 sgo:sdDataset chapters
82 rdf:type schema:Chapter
83 N03953e4ddb1b45cda217243e4d3393c7 rdf:first sg:person.01336557015.68
84 rdf:rest rdf:nil
85 N0c41548b07c3480ca13355a3c6c5ac01 schema:name Springer Nature
86 rdf:type schema:Organisation
87 N1401ff71e59e42de8b785dc89cb9c63e schema:familyName Martín-Vide
88 schema:givenName Carlos
89 rdf:type schema:Person
90 N161b88815aea4e4380bf8541d3715396 schema:name Springer Nature - SN SciGraph project
91 rdf:type schema:Organization
92 N200df302641a4a1a86ada5ec9c5b6e86 schema:familyName Wheeler
93 schema:givenName Travis
94 rdf:type schema:Person
95 N2e8892749b85436fa653eb99e85a8350 rdf:first sg:person.013033776043.37
96 rdf:rest N03953e4ddb1b45cda217243e4d3393c7
97 N3056c227b8324afb9948a10a03ba7dc6 rdf:first sg:person.0746114007.76
98 rdf:rest N2e8892749b85436fa653eb99e85a8350
99 N4a031437b50347fa9df0e3dad324f54f rdf:first Nbe210b41d494409cb0d1e844cdb8f20b
100 rdf:rest N953125ce18c54d53837084ae1e711bfa
101 N752bc4e4f26a4f97aa19d810ead6feb0 rdf:first sg:person.015720005324.51
102 rdf:rest N3056c227b8324afb9948a10a03ba7dc6
103 N953125ce18c54d53837084ae1e711bfa rdf:first N200df302641a4a1a86ada5ec9c5b6e86
104 rdf:rest rdf:nil
105 Nabd71242919b4f398a3e2008764770ff schema:name doi
106 schema:value 10.1007/978-3-030-42266-0_14
107 rdf:type schema:PropertyValue
108 Nadc3cb9069e04330a74aab1e94d25107 schema:name dimensions_id
109 schema:value pub.1125828979
110 rdf:type schema:PropertyValue
111 Nb432f5431b1d46e696991b0a8472693a rdf:first sg:person.010555173053.78
112 rdf:rest N752bc4e4f26a4f97aa19d810ead6feb0
113 Nbcb947280b164f1e9faf7415c3d22b59 rdf:first N1401ff71e59e42de8b785dc89cb9c63e
114 rdf:rest N4a031437b50347fa9df0e3dad324f54f
115 Nbe210b41d494409cb0d1e844cdb8f20b schema:familyName Vega-Rodríguez
116 schema:givenName Miguel A.
117 rdf:type schema:Person
118 Nfee4ecb2fbe94c3f9195a718b51575aa schema:isbn 978-3-030-42265-3
119 978-3-030-42266-0
120 schema:name Algorithms for Computational Biology
121 rdf:type schema:Book
122 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
123 schema:name Mathematical Sciences
124 rdf:type schema:DefinedTerm
125 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
126 schema:name Statistics
127 rdf:type schema:DefinedTerm
128 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
129 schema:name Biological Sciences
130 rdf:type schema:DefinedTerm
131 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
132 schema:name Genetics
133 rdf:type schema:DefinedTerm
134 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
135 schema:name Information and Computing Sciences
136 rdf:type schema:DefinedTerm
137 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
138 schema:name Artificial Intelligence and Image Processing
139 rdf:type schema:DefinedTerm
140 sg:person.010555173053.78 schema:affiliation grid-institutes:grid.261331.4
141 schema:familyName Mandal
142 schema:givenName Sayan
143 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010555173053.78
144 rdf:type schema:Person
145 sg:person.013033776043.37 schema:affiliation grid-institutes:grid.169077.e
146 schema:familyName Basu
147 schema:givenName Saugata
148 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013033776043.37
149 rdf:type schema:Person
150 sg:person.01336557015.68 schema:affiliation grid-institutes:grid.481554.9
151 schema:familyName Parida
152 schema:givenName Laxmi
153 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68
154 rdf:type schema:Person
155 sg:person.015720005324.51 schema:affiliation grid-institutes:grid.481554.9
156 schema:familyName Guzmán-Sáenz
157 schema:givenName Aldo
158 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015720005324.51
159 rdf:type schema:Person
160 sg:person.0746114007.76 schema:affiliation grid-institutes:grid.481554.9
161 schema:familyName Haiminen
162 schema:givenName Niina
163 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0746114007.76
164 rdf:type schema:Person
165 grid-institutes:grid.169077.e schema:alternateName Purdue University, West Lafayette, IN, USA
166 schema:name Purdue University, West Lafayette, IN, USA
167 rdf:type schema:Organization
168 grid-institutes:grid.261331.4 schema:alternateName The Ohio State University, Columbus, OH, USA
169 schema:name The Ohio State University, Columbus, OH, USA
170 rdf:type schema:Organization
171 grid-institutes:grid.481554.9 schema:alternateName IBM Research, T. J. Watson Research Center, Yorktown Heights, NY, USA
172 schema:name IBM Research, T. J. Watson Research Center, Yorktown Heights, NY, USA
173 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...