Automated Enzyme Classification by Formal Concept Analysis View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2014

AUTHORS

François Coste , Gaëlle Garet , Agnès Groisillier , Jacques Nicolas , Thierry Tonon

ABSTRACT

Enzymes are macro-molecules (linear sequences of linked molecules) with a catalytic activity that make them essential for any biochemical reaction. High throughput genomic techniques give access to the sequence of new enzymes found in living organisms. Guessing the enzyme’s functional activity from its sequence is a crucial task that can be approached by comparing the new sequences with those of already known enzymes labeled by a family class. This task is difficult because the activity is based on a combination of small sequence patterns and sequences greatly evolved over time. This paper presents a classifier based on the identification of common subsequence blocks between known and new enzymes and the search of formal concepts built on the cross product of blocks and sequences for each class. Since new enzyme families may emerge, it is important to propose a first classification of enzymes that cannot be assigned to a known family. FCA offers a nice framework to set the task as an optimization problem on the set of concepts. The classifier has been tested with success on a particular set of enzymes present in a large variety of species, the haloacid dehalogenase superfamily. More... »

PAGES

235-250

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-07248-7_17

DOI

http://dx.doi.org/10.1007/978-3-319-07248-7_17

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1042590986


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0601", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biochemistry and Cell Biology", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Irisa / Inria Rennes, Campus de Beaulieu, 35042, Rennes cedex, France", 
          "id": "http://www.grid.ac/institutes/grid.410368.8", 
          "name": [
            "Irisa / Inria Rennes, Campus de Beaulieu, 35042, Rennes cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Coste", 
        "givenName": "Fran\u00e7ois", 
        "id": "sg:person.012322041531.28", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012322041531.28"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Irisa / Inria Rennes, Campus de Beaulieu, 35042, Rennes cedex, France", 
          "id": "http://www.grid.ac/institutes/grid.410368.8", 
          "name": [
            "Irisa / Inria Rennes, Campus de Beaulieu, 35042, Rennes cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Garet", 
        "givenName": "Ga\u00eblle", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Integrative Biology of Marine Models, Sorbonne Universit\u00e9s, UPMC Univ Paris 06, UMR 8227, and CNRS, UMR 8227, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff cedex, France", 
          "id": "http://www.grid.ac/institutes/grid.464101.6", 
          "name": [
            "Integrative Biology of Marine Models, Sorbonne Universit\u00e9s, UPMC Univ Paris 06, UMR 8227, and CNRS, UMR 8227, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Groisillier", 
        "givenName": "Agn\u00e8s", 
        "id": "sg:person.01321331757.66", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01321331757.66"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Irisa / Inria Rennes, Campus de Beaulieu, 35042, Rennes cedex, France", 
          "id": "http://www.grid.ac/institutes/grid.410368.8", 
          "name": [
            "Irisa / Inria Rennes, Campus de Beaulieu, 35042, Rennes cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nicolas", 
        "givenName": "Jacques", 
        "id": "sg:person.01143715001.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01143715001.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Integrative Biology of Marine Models, Sorbonne Universit\u00e9s, UPMC Univ Paris 06, UMR 8227, and CNRS, UMR 8227, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff cedex, France", 
          "id": "http://www.grid.ac/institutes/grid.464101.6", 
          "name": [
            "Integrative Biology of Marine Models, Sorbonne Universit\u00e9s, UPMC Univ Paris 06, UMR 8227, and CNRS, UMR 8227, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tonon", 
        "givenName": "Thierry", 
        "id": "sg:person.01166672346.33", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01166672346.33"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2014", 
    "datePublishedReg": "2014-01-01", 
    "description": "Enzymes are macro-molecules (linear sequences of linked molecules) with a catalytic activity that make them essential for any biochemical reaction. High throughput genomic techniques give access to the sequence of new enzymes found in living organisms. Guessing the enzyme\u2019s functional activity from its sequence is a crucial task that can be approached by comparing the new sequences with those of already known enzymes labeled by a family class. This task is difficult because the activity is based on a combination of small sequence patterns and sequences greatly evolved over time. This paper presents a classifier based on the identification of common subsequence blocks between known and new enzymes and the search of formal concepts built on the cross product of blocks and sequences for each class. Since new enzyme families may emerge, it is important to propose a first classification of enzymes that cannot be assigned to a known family. FCA offers a nice framework to set the task as an optimization problem on the set of concepts. The classifier has been tested with success on a particular set of enzymes present in a large variety of species, the haloacid dehalogenase superfamily.", 
    "editor": [
      {
        "familyName": "Glodeanu", 
        "givenName": "Cynthia Vera", 
        "type": "Person"
      }, 
      {
        "familyName": "Kaytoue", 
        "givenName": "Mehdi", 
        "type": "Person"
      }, 
      {
        "familyName": "Sacarea", 
        "givenName": "Christian", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-07248-7_17", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-319-07247-0", 
        "978-3-319-07248-7"
      ], 
      "name": "Formal Concept Analysis", 
      "type": "Book"
    }, 
    "keywords": [
      "High throughput genomic techniques", 
      "new enzyme", 
      "new enzyme family", 
      "enzyme functional activity", 
      "genomic techniques", 
      "haloacid dehalogenase", 
      "enzyme family", 
      "enzyme classification", 
      "functional activity", 
      "enzyme", 
      "new sequences", 
      "sequence patterns", 
      "biochemical reactions", 
      "sequence", 
      "family", 
      "family classes", 
      "organisms", 
      "dehalogenase", 
      "species", 
      "activity", 
      "catalytic activity", 
      "large variety", 
      "identification", 
      "patterns", 
      "first classification", 
      "particular set", 
      "variety", 
      "class", 
      "products", 
      "FCA", 
      "analysis", 
      "set", 
      "success", 
      "combination", 
      "block", 
      "classification", 
      "reaction", 
      "search", 
      "time", 
      "crucial task", 
      "concept", 
      "technique", 
      "set of concepts", 
      "access", 
      "nice framework", 
      "framework", 
      "formal concepts", 
      "classifier", 
      "problem", 
      "task", 
      "cross product", 
      "paper", 
      "concept analysis", 
      "Formal Concept Analysis", 
      "optimization problem", 
      "throughput genomic techniques", 
      "small sequence patterns", 
      "common subsequence blocks", 
      "subsequence blocks"
    ], 
    "name": "Automated Enzyme Classification by Formal Concept Analysis", 
    "pagination": "235-250", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1042590986"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-07248-7_17"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-07248-7_17", 
      "https://app.dimensions.ai/details/publication/pub.1042590986"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2021-11-01T18:54", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/chapter/chapter_292.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-319-07248-7_17"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-07248-7_17'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-07248-7_17'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-07248-7_17'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-07248-7_17'


 

This table displays all metadata directly associated to this object as RDF triples.

159 TRIPLES      23 PREDICATES      85 URIs      78 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-07248-7_17 schema:about anzsrc-for:06
2 anzsrc-for:0601
3 schema:author N6aafd2dc45a1487cbced15f3e437701c
4 schema:datePublished 2014
5 schema:datePublishedReg 2014-01-01
6 schema:description Enzymes are macro-molecules (linear sequences of linked molecules) with a catalytic activity that make them essential for any biochemical reaction. High throughput genomic techniques give access to the sequence of new enzymes found in living organisms. Guessing the enzyme’s functional activity from its sequence is a crucial task that can be approached by comparing the new sequences with those of already known enzymes labeled by a family class. This task is difficult because the activity is based on a combination of small sequence patterns and sequences greatly evolved over time. This paper presents a classifier based on the identification of common subsequence blocks between known and new enzymes and the search of formal concepts built on the cross product of blocks and sequences for each class. Since new enzyme families may emerge, it is important to propose a first classification of enzymes that cannot be assigned to a known family. FCA offers a nice framework to set the task as an optimization problem on the set of concepts. The classifier has been tested with success on a particular set of enzymes present in a large variety of species, the haloacid dehalogenase superfamily.
7 schema:editor N8a08823c13a344cd80a8362a8aa5bb96
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf Nf564101e8e554d6a84646af10840d9f7
12 schema:keywords FCA
13 Formal Concept Analysis
14 High throughput genomic techniques
15 access
16 activity
17 analysis
18 biochemical reactions
19 block
20 catalytic activity
21 class
22 classification
23 classifier
24 combination
25 common subsequence blocks
26 concept
27 concept analysis
28 cross product
29 crucial task
30 dehalogenase
31 enzyme
32 enzyme classification
33 enzyme family
34 enzyme functional activity
35 family
36 family classes
37 first classification
38 formal concepts
39 framework
40 functional activity
41 genomic techniques
42 haloacid dehalogenase
43 identification
44 large variety
45 new enzyme
46 new enzyme family
47 new sequences
48 nice framework
49 optimization problem
50 organisms
51 paper
52 particular set
53 patterns
54 problem
55 products
56 reaction
57 search
58 sequence
59 sequence patterns
60 set
61 set of concepts
62 small sequence patterns
63 species
64 subsequence blocks
65 success
66 task
67 technique
68 throughput genomic techniques
69 time
70 variety
71 schema:name Automated Enzyme Classification by Formal Concept Analysis
72 schema:pagination 235-250
73 schema:productId N2d3393117b364a1280002ea42888b359
74 N6537f7b73d07420e845f2ae1749044db
75 schema:publisher N5ad98ad36b0845b6a24cd4a0e19416dc
76 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042590986
77 https://doi.org/10.1007/978-3-319-07248-7_17
78 schema:sdDatePublished 2021-11-01T18:54
79 schema:sdLicense https://scigraph.springernature.com/explorer/license/
80 schema:sdPublisher N52ab1036a730426eadf027fa885abe23
81 schema:url https://doi.org/10.1007/978-3-319-07248-7_17
82 sgo:license sg:explorer/license/
83 sgo:sdDataset chapters
84 rdf:type schema:Chapter
85 N28572f868fff40ffa4a5aecd497de931 schema:familyName Kaytoue
86 schema:givenName Mehdi
87 rdf:type schema:Person
88 N2d0a4c8f24774c6cb518e0b91aba8c88 rdf:first sg:person.01166672346.33
89 rdf:rest rdf:nil
90 N2d3393117b364a1280002ea42888b359 schema:name dimensions_id
91 schema:value pub.1042590986
92 rdf:type schema:PropertyValue
93 N33a8325c17a3402b94bb4a9b3c5f88b8 rdf:first N78681f19aa5147ca875899c2b1879a99
94 rdf:rest rdf:nil
95 N4776fb5aab3940f0b38bdcd74fb896d4 rdf:first sg:person.01143715001.20
96 rdf:rest N2d0a4c8f24774c6cb518e0b91aba8c88
97 N52ab1036a730426eadf027fa885abe23 schema:name Springer Nature - SN SciGraph project
98 rdf:type schema:Organization
99 N5ad98ad36b0845b6a24cd4a0e19416dc schema:name Springer Nature
100 rdf:type schema:Organisation
101 N6537f7b73d07420e845f2ae1749044db schema:name doi
102 schema:value 10.1007/978-3-319-07248-7_17
103 rdf:type schema:PropertyValue
104 N69d861cb68824d7585bdf998371eeac5 schema:familyName Glodeanu
105 schema:givenName Cynthia Vera
106 rdf:type schema:Person
107 N6aafd2dc45a1487cbced15f3e437701c rdf:first sg:person.012322041531.28
108 rdf:rest Na9d3bdfc0edc4648926f5e80de04bae3
109 N78681f19aa5147ca875899c2b1879a99 schema:familyName Sacarea
110 schema:givenName Christian
111 rdf:type schema:Person
112 N8475a4e784cc48cdbb3e4dd7e9d45b06 rdf:first N28572f868fff40ffa4a5aecd497de931
113 rdf:rest N33a8325c17a3402b94bb4a9b3c5f88b8
114 N8a08823c13a344cd80a8362a8aa5bb96 rdf:first N69d861cb68824d7585bdf998371eeac5
115 rdf:rest N8475a4e784cc48cdbb3e4dd7e9d45b06
116 Na9d3bdfc0edc4648926f5e80de04bae3 rdf:first Nbb89810cc2044d5284a11193fa71d8eb
117 rdf:rest Nf7d593928fc94f06bac29800039648b3
118 Nbb89810cc2044d5284a11193fa71d8eb schema:affiliation grid-institutes:grid.410368.8
119 schema:familyName Garet
120 schema:givenName Gaëlle
121 rdf:type schema:Person
122 Nf564101e8e554d6a84646af10840d9f7 schema:isbn 978-3-319-07247-0
123 978-3-319-07248-7
124 schema:name Formal Concept Analysis
125 rdf:type schema:Book
126 Nf7d593928fc94f06bac29800039648b3 rdf:first sg:person.01321331757.66
127 rdf:rest N4776fb5aab3940f0b38bdcd74fb896d4
128 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
129 schema:name Biological Sciences
130 rdf:type schema:DefinedTerm
131 anzsrc-for:0601 schema:inDefinedTermSet anzsrc-for:
132 schema:name Biochemistry and Cell Biology
133 rdf:type schema:DefinedTerm
134 sg:person.01143715001.20 schema:affiliation grid-institutes:grid.410368.8
135 schema:familyName Nicolas
136 schema:givenName Jacques
137 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01143715001.20
138 rdf:type schema:Person
139 sg:person.01166672346.33 schema:affiliation grid-institutes:grid.464101.6
140 schema:familyName Tonon
141 schema:givenName Thierry
142 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01166672346.33
143 rdf:type schema:Person
144 sg:person.012322041531.28 schema:affiliation grid-institutes:grid.410368.8
145 schema:familyName Coste
146 schema:givenName François
147 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012322041531.28
148 rdf:type schema:Person
149 sg:person.01321331757.66 schema:affiliation grid-institutes:grid.464101.6
150 schema:familyName Groisillier
151 schema:givenName Agnès
152 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01321331757.66
153 rdf:type schema:Person
154 grid-institutes:grid.410368.8 schema:alternateName Irisa / Inria Rennes, Campus de Beaulieu, 35042, Rennes cedex, France
155 schema:name Irisa / Inria Rennes, Campus de Beaulieu, 35042, Rennes cedex, France
156 rdf:type schema:Organization
157 grid-institutes:grid.464101.6 schema:alternateName Integrative Biology of Marine Models, Sorbonne Universités, UPMC Univ Paris 06, UMR 8227, and CNRS, UMR 8227, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff cedex, France
158 schema:name Integrative Biology of Marine Models, Sorbonne Universités, UPMC Univ Paris 06, UMR 8227, and CNRS, UMR 8227, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff cedex, France
159 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...