Sequence Motifs: Highly Predictive Features of Protein Function View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2006

AUTHORS

Asa Ben-Hur , Douglas Brutlag

ABSTRACT

Protein function prediction, i.e. classification of proteins according to their biological function, is an important task in bioinformatics. In this chapter, we illustrate that the presence of sequence motifs — elements that are conserved across different proteins — are highly discriminative features for predicting the function of a protein. This is in agreement with the biological thinking that considers motifs to be the building blocks of protein sequences. We focus on proteins annotated as enzymes, and show that despite the fact that motif composition is a very high dimensional representation of a sequence, that most classes of enzymes can be classified using a handful of motifs, yielding accurate and interpretable classifiers. The enzyme data falls into a large number of classes; we find that the one-against-the-rest multi-class method works better than the one-against-one method on this data. More... »

PAGES

625-645

Book

TITLE

Feature Extraction

ISBN

978-3-540-35487-1
978-3-540-35488-8

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-35488-8_32

DOI

http://dx.doi.org/10.1007/978-3-540-35488-8_32

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1052630781


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0601", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biochemistry and Cell Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Colorado State University", 
          "id": "https://www.grid.ac/institutes/grid.47894.36", 
          "name": [
            "Department of Computer Science, Colorado State University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ben-Hur", 
        "givenName": "Asa", 
        "id": "sg:person.01242755504.30", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01242755504.30"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Stanford University", 
          "id": "https://www.grid.ac/institutes/grid.168010.e", 
          "name": [
            "Department of Biochemistry, Stanford University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brutlag", 
        "givenName": "Douglas", 
        "id": "sg:person.01310464336.05", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01310464336.05"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1126/science.1058040", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001517867"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/17.suppl_1.s316", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002708319"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/26.1.320", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004479252"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.95.11.5865", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008099366"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/15.6.471", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008571793"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btg1002", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013915575"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gki060", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020293550"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.84.13.4355", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024077784"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0022-2836(81)90087-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024589839"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bib/3.3.275", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033734956"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/30.1.239", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035868373"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4615-0907-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037183810", 
          "https://doi.org/10.1007/978-1-4615-0907-3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4615-0907-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037183810", 
          "https://doi.org/10.1007/978-1-4615-0907-3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/29.1.202", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040462057"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35057062", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042854081", 
          "https://doi.org/10.1038/35057062"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35057062", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042854081", 
          "https://doi.org/10.1038/35057062"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/25.17.3389", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047265454"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1012487302797", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048573168", 
          "https://doi.org/10.1023/a:1012487302797"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/30.1.235", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048836665"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.97.1.262", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048892448"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/640075.640114", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049511032"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1076835775", 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2006", 
    "datePublishedReg": "2006-01-01", 
    "description": "Protein function prediction, i.e. classification of proteins according to their biological function, is an important task in bioinformatics. In this chapter, we illustrate that the presence of sequence motifs \u2014 elements that are conserved across different proteins \u2014 are highly discriminative features for predicting the function of a protein. This is in agreement with the biological thinking that considers motifs to be the building blocks of protein sequences. We focus on proteins annotated as enzymes, and show that despite the fact that motif composition is a very high dimensional representation of a sequence, that most classes of enzymes can be classified using a handful of motifs, yielding accurate and interpretable classifiers. The enzyme data falls into a large number of classes; we find that the one-against-the-rest multi-class method works better than the one-against-one method on this data.", 
    "editor": [
      {
        "familyName": "Guyon", 
        "givenName": "Isabelle", 
        "type": "Person"
      }, 
      {
        "familyName": "Nikravesh", 
        "givenName": "Masoud", 
        "type": "Person"
      }, 
      {
        "familyName": "Gunn", 
        "givenName": "Steve", 
        "type": "Person"
      }, 
      {
        "familyName": "Zadeh", 
        "givenName": "Lotfi A.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-35488-8_32", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-35487-1", 
        "978-3-540-35488-8"
      ], 
      "name": "Feature Extraction", 
      "type": "Book"
    }, 
    "name": "Sequence Motifs: Highly Predictive Features of Protein Function", 
    "pagination": "625-645", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-35488-8_32"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "fab27579f1a04cb33671c037c064cd3b7bdb9afe9554c9cab3e155f5441b139e"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1052630781"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-35488-8_32", 
      "https://app.dimensions.ai/details/publication/pub.1052630781"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T06:08", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000350_0000000350/records_77544_00000001.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-3-540-35488-8_32"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-35488-8_32'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-35488-8_32'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-35488-8_32'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-35488-8_32'


 

This table displays all metadata directly associated to this object as RDF triples.

152 TRIPLES      23 PREDICATES      47 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-35488-8_32 schema:about anzsrc-for:06
2 anzsrc-for:0601
3 schema:author Nc9e9e1ed340349a9a563bef05c030ed2
4 schema:citation sg:pub.10.1007/978-1-4615-0907-3
5 sg:pub.10.1023/a:1012487302797
6 sg:pub.10.1038/35057062
7 https://app.dimensions.ai/details/publication/pub.1076835775
8 https://doi.org/10.1016/0022-2836(81)90087-5
9 https://doi.org/10.1073/pnas.84.13.4355
10 https://doi.org/10.1073/pnas.95.11.5865
11 https://doi.org/10.1073/pnas.97.1.262
12 https://doi.org/10.1093/bib/3.3.275
13 https://doi.org/10.1093/bioinformatics/15.6.471
14 https://doi.org/10.1093/bioinformatics/17.suppl_1.s316
15 https://doi.org/10.1093/bioinformatics/btg1002
16 https://doi.org/10.1093/nar/25.17.3389
17 https://doi.org/10.1093/nar/26.1.320
18 https://doi.org/10.1093/nar/29.1.202
19 https://doi.org/10.1093/nar/30.1.235
20 https://doi.org/10.1093/nar/30.1.239
21 https://doi.org/10.1093/nar/gki060
22 https://doi.org/10.1126/science.1058040
23 https://doi.org/10.1145/640075.640114
24 schema:datePublished 2006
25 schema:datePublishedReg 2006-01-01
26 schema:description Protein function prediction, i.e. classification of proteins according to their biological function, is an important task in bioinformatics. In this chapter, we illustrate that the presence of sequence motifs — elements that are conserved across different proteins — are highly discriminative features for predicting the function of a protein. This is in agreement with the biological thinking that considers motifs to be the building blocks of protein sequences. We focus on proteins annotated as enzymes, and show that despite the fact that motif composition is a very high dimensional representation of a sequence, that most classes of enzymes can be classified using a handful of motifs, yielding accurate and interpretable classifiers. The enzyme data falls into a large number of classes; we find that the one-against-the-rest multi-class method works better than the one-against-one method on this data.
27 schema:editor N9485679e7e864384b42f3e78ec1c6ae5
28 schema:genre chapter
29 schema:inLanguage en
30 schema:isAccessibleForFree true
31 schema:isPartOf N3f64481c450942778410df9c55ee2f8b
32 schema:name Sequence Motifs: Highly Predictive Features of Protein Function
33 schema:pagination 625-645
34 schema:productId Na4213298a1af4604b0a88a1869da8d9a
35 Nc9726c2088e14a97a52bd352b1f5c0c0
36 Ne0fcbf8860b041708f16f5ee1065375d
37 schema:publisher Ncf5b34f4e7934b928874bf44f7f24707
38 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052630781
39 https://doi.org/10.1007/978-3-540-35488-8_32
40 schema:sdDatePublished 2019-04-16T06:08
41 schema:sdLicense https://scigraph.springernature.com/explorer/license/
42 schema:sdPublisher Na89d11e46f7d4074a1d23161eb26fa06
43 schema:url https://link.springer.com/10.1007%2F978-3-540-35488-8_32
44 sgo:license sg:explorer/license/
45 sgo:sdDataset chapters
46 rdf:type schema:Chapter
47 N1b75112daf554c45b5b52fa2a6c69c5a rdf:first N87cfc96a47f04a39916af60431cac179
48 rdf:rest Na62d1723ccf2426eaaac452e3baf60ed
49 N3f64481c450942778410df9c55ee2f8b schema:isbn 978-3-540-35487-1
50 978-3-540-35488-8
51 schema:name Feature Extraction
52 rdf:type schema:Book
53 N4be85816d9cc4378a8ea705250fc37e5 schema:familyName Nikravesh
54 schema:givenName Masoud
55 rdf:type schema:Person
56 N4cfb89e77d2e47358530b85c3eb9085d rdf:first sg:person.01310464336.05
57 rdf:rest rdf:nil
58 N66bdc62d08d24e5e9d219bcd20308cd0 schema:familyName Guyon
59 schema:givenName Isabelle
60 rdf:type schema:Person
61 N87cfc96a47f04a39916af60431cac179 schema:familyName Gunn
62 schema:givenName Steve
63 rdf:type schema:Person
64 N9485679e7e864384b42f3e78ec1c6ae5 rdf:first N66bdc62d08d24e5e9d219bcd20308cd0
65 rdf:rest Nf1238af7f24f4f698c5242143dff4749
66 N9efe2c0b84c74a73a4cd9df9261e6a89 schema:familyName Zadeh
67 schema:givenName Lotfi A.
68 rdf:type schema:Person
69 Na4213298a1af4604b0a88a1869da8d9a schema:name doi
70 schema:value 10.1007/978-3-540-35488-8_32
71 rdf:type schema:PropertyValue
72 Na62d1723ccf2426eaaac452e3baf60ed rdf:first N9efe2c0b84c74a73a4cd9df9261e6a89
73 rdf:rest rdf:nil
74 Na89d11e46f7d4074a1d23161eb26fa06 schema:name Springer Nature - SN SciGraph project
75 rdf:type schema:Organization
76 Nc9726c2088e14a97a52bd352b1f5c0c0 schema:name dimensions_id
77 schema:value pub.1052630781
78 rdf:type schema:PropertyValue
79 Nc9e9e1ed340349a9a563bef05c030ed2 rdf:first sg:person.01242755504.30
80 rdf:rest N4cfb89e77d2e47358530b85c3eb9085d
81 Ncf5b34f4e7934b928874bf44f7f24707 schema:location Berlin, Heidelberg
82 schema:name Springer Berlin Heidelberg
83 rdf:type schema:Organisation
84 Ne0fcbf8860b041708f16f5ee1065375d schema:name readcube_id
85 schema:value fab27579f1a04cb33671c037c064cd3b7bdb9afe9554c9cab3e155f5441b139e
86 rdf:type schema:PropertyValue
87 Nf1238af7f24f4f698c5242143dff4749 rdf:first N4be85816d9cc4378a8ea705250fc37e5
88 rdf:rest N1b75112daf554c45b5b52fa2a6c69c5a
89 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
90 schema:name Biological Sciences
91 rdf:type schema:DefinedTerm
92 anzsrc-for:0601 schema:inDefinedTermSet anzsrc-for:
93 schema:name Biochemistry and Cell Biology
94 rdf:type schema:DefinedTerm
95 sg:person.01242755504.30 schema:affiliation https://www.grid.ac/institutes/grid.47894.36
96 schema:familyName Ben-Hur
97 schema:givenName Asa
98 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01242755504.30
99 rdf:type schema:Person
100 sg:person.01310464336.05 schema:affiliation https://www.grid.ac/institutes/grid.168010.e
101 schema:familyName Brutlag
102 schema:givenName Douglas
103 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01310464336.05
104 rdf:type schema:Person
105 sg:pub.10.1007/978-1-4615-0907-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037183810
106 https://doi.org/10.1007/978-1-4615-0907-3
107 rdf:type schema:CreativeWork
108 sg:pub.10.1023/a:1012487302797 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048573168
109 https://doi.org/10.1023/a:1012487302797
110 rdf:type schema:CreativeWork
111 sg:pub.10.1038/35057062 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042854081
112 https://doi.org/10.1038/35057062
113 rdf:type schema:CreativeWork
114 https://app.dimensions.ai/details/publication/pub.1076835775 schema:CreativeWork
115 https://doi.org/10.1016/0022-2836(81)90087-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024589839
116 rdf:type schema:CreativeWork
117 https://doi.org/10.1073/pnas.84.13.4355 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024077784
118 rdf:type schema:CreativeWork
119 https://doi.org/10.1073/pnas.95.11.5865 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008099366
120 rdf:type schema:CreativeWork
121 https://doi.org/10.1073/pnas.97.1.262 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048892448
122 rdf:type schema:CreativeWork
123 https://doi.org/10.1093/bib/3.3.275 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033734956
124 rdf:type schema:CreativeWork
125 https://doi.org/10.1093/bioinformatics/15.6.471 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008571793
126 rdf:type schema:CreativeWork
127 https://doi.org/10.1093/bioinformatics/17.suppl_1.s316 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002708319
128 rdf:type schema:CreativeWork
129 https://doi.org/10.1093/bioinformatics/btg1002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013915575
130 rdf:type schema:CreativeWork
131 https://doi.org/10.1093/nar/25.17.3389 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047265454
132 rdf:type schema:CreativeWork
133 https://doi.org/10.1093/nar/26.1.320 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004479252
134 rdf:type schema:CreativeWork
135 https://doi.org/10.1093/nar/29.1.202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040462057
136 rdf:type schema:CreativeWork
137 https://doi.org/10.1093/nar/30.1.235 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048836665
138 rdf:type schema:CreativeWork
139 https://doi.org/10.1093/nar/30.1.239 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035868373
140 rdf:type schema:CreativeWork
141 https://doi.org/10.1093/nar/gki060 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020293550
142 rdf:type schema:CreativeWork
143 https://doi.org/10.1126/science.1058040 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001517867
144 rdf:type schema:CreativeWork
145 https://doi.org/10.1145/640075.640114 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049511032
146 rdf:type schema:CreativeWork
147 https://www.grid.ac/institutes/grid.168010.e schema:alternateName Stanford University
148 schema:name Department of Biochemistry, Stanford University, USA
149 rdf:type schema:Organization
150 https://www.grid.ac/institutes/grid.47894.36 schema:alternateName Colorado State University
151 schema:name Department of Computer Science, Colorado State University, USA
152 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...