Protein function prediction using domain families View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2013-02-28

AUTHORS

Robert Rentzsch, Christine A Orengo

ABSTRACT

Here we assessed the use of domain families for predicting the functions of whole proteins. These 'functional families' (FunFams) were derived using a protocol that combines sequence clustering with supervised cluster evaluation, relying on available high-quality Gene Ontology (GO) annotation data in the latter step. In essence, the protocol groups domain sequences belonging to the same superfamily into families based on the GO annotations of their parent proteins. An initial test based on enzyme sequences confirmed that the FunFams resemble enzyme (domain) families much better than do families produced by sequence clustering alone. For the CAFA 2011 experiment, we further associated the FunFams with GO terms probabilistically. All target proteins were first submitted to domain superfamily assignment, followed by FunFam assignment and, eventually, function assignment. The latter included an integration step for multi-domain target proteins. The CAFA results put our domain-based approach among the top ten of 31 competing groups and 56 prediction methods, confirming that it outperforms simple pairwise whole-protein sequence comparisons. More... »

PAGES

s5-s5

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-14-s3-s5

DOI

http://dx.doi.org/10.1186/1471-2105-14-s3-s5

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1037930554

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/23514456


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0601", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biochemistry and Cell Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Cluster Analysis", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Protein", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Molecular Sequence Annotation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Protein Structure, Tertiary", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Proteins", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, Protein", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Vocabulary, Controlled", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Robert Koch Institut, Research Group Bioinformatics Ng4, Nordufer 20, 13353 Berlin, Germany", 
          "id": "http://www.grid.ac/institutes/grid.13652.33", 
          "name": [
            "Robert Koch Institut, Research Group Bioinformatics Ng4, Nordufer 20, 13353 Berlin, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rentzsch", 
        "givenName": "Robert", 
        "id": "sg:person.0711664756.15", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0711664756.15"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK", 
          "id": "http://www.grid.ac/institutes/grid.509978.a", 
          "name": [
            "Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Orengo", 
        "givenName": "Christine A", 
        "id": "sg:person.01136244107.52", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01136244107.52"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/75556", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044135237", 
          "https://doi.org/10.1038/75556"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-6-s1-s17", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051824257", 
          "https://doi.org/10.1186/1471-2105-6-s1-s17"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-5-178", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018640803", 
          "https://doi.org/10.1186/1471-2105-5-178"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2013-02-28", 
    "datePublishedReg": "2013-02-28", 
    "description": "Here we assessed the use of domain families for predicting the functions of whole proteins. These 'functional families' (FunFams) were derived using a protocol that combines sequence clustering with supervised cluster evaluation, relying on available high-quality Gene Ontology (GO) annotation data in the latter step. In essence, the protocol groups domain sequences belonging to the same superfamily into families based on the GO annotations of their parent proteins. An initial test based on enzyme sequences confirmed that the FunFams resemble enzyme (domain) families much better than do families produced by sequence clustering alone. For the CAFA 2011 experiment, we further associated the FunFams with GO terms probabilistically. All target proteins were first submitted to domain superfamily assignment, followed by FunFam assignment and, eventually, function assignment. The latter included an integration step for multi-domain target proteins. The CAFA results put our domain-based approach among the top ten of 31 competing groups and 56 prediction methods, confirming that it outperforms simple pairwise whole-protein sequence comparisons.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1471-2105-14-s3-s5", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2783003", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "Suppl 3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "14"
      }
    ], 
    "keywords": [
      "domain family", 
      "target proteins", 
      "Gene Ontology (GO) annotation data", 
      "protein function prediction", 
      "GO terms", 
      "GO annotations", 
      "sequence comparison", 
      "enzyme sequences", 
      "enzyme family", 
      "same superfamily", 
      "function prediction", 
      "function assignment", 
      "annotation data", 
      "parent protein", 
      "whole protein", 
      "functional families", 
      "protein", 
      "FunFams", 
      "sequence", 
      "family", 
      "domain-based approach", 
      "superfamily", 
      "latter step", 
      "annotation", 
      "step", 
      "function", 
      "assignment", 
      "prediction method", 
      "integration step", 
      "experiments", 
      "group", 
      "data", 
      "comparison", 
      "results", 
      "prediction", 
      "protocol", 
      "approach", 
      "use", 
      "cluster evaluation", 
      "initial test", 
      "method", 
      "terms", 
      "test", 
      "evaluation", 
      "essence", 
      "protocol group", 
      "supervised cluster evaluation", 
      "available high-quality Gene Ontology (GO) annotation data", 
      "high-quality Gene Ontology (GO) annotation data", 
      "Ontology (GO) annotation data", 
      "CAFA 2011 experiment", 
      "FunFam assignment", 
      "multi-domain target proteins", 
      "CAFA results", 
      "simple pairwise whole-protein sequence comparisons", 
      "pairwise whole-protein sequence comparisons", 
      "whole-protein sequence comparisons"
    ], 
    "name": "Protein function prediction using domain families", 
    "pagination": "s5-s5", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1037930554"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-14-s3-s5"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "23514456"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-14-s3-s5", 
      "https://app.dimensions.ai/details/publication/pub.1037930554"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2021-12-01T19:29", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211201/entities/gbq_results/article/article_606.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1471-2105-14-s3-s5"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-14-s3-s5'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-14-s3-s5'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-14-s3-s5'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-14-s3-s5'


 

This table displays all metadata directly associated to this object as RDF triples.

170 TRIPLES      22 PREDICATES      93 URIs      82 LITERALS      14 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-14-s3-s5 schema:about N2f9f5d7783e64ec38cf4656c7823131d
2 N888916dd6d754131864bfff6ef0f029a
3 N982ecf96279f4789b6614c96bfb43754
4 Nba785485c16d40e98a38d88612a7176a
5 Nc641020ed6d945c9be3fddc2d2b31014
6 Nc9b8bd17677345af9c326f366e83c04b
7 Ncbab0bf6f44142e9ba19e84250a054f1
8 anzsrc-for:06
9 anzsrc-for:0601
10 schema:author N79222fbb4f7c4483a086a70eae2e4537
11 schema:citation sg:pub.10.1038/75556
12 sg:pub.10.1186/1471-2105-5-178
13 sg:pub.10.1186/1471-2105-6-s1-s17
14 schema:datePublished 2013-02-28
15 schema:datePublishedReg 2013-02-28
16 schema:description Here we assessed the use of domain families for predicting the functions of whole proteins. These 'functional families' (FunFams) were derived using a protocol that combines sequence clustering with supervised cluster evaluation, relying on available high-quality Gene Ontology (GO) annotation data in the latter step. In essence, the protocol groups domain sequences belonging to the same superfamily into families based on the GO annotations of their parent proteins. An initial test based on enzyme sequences confirmed that the FunFams resemble enzyme (domain) families much better than do families produced by sequence clustering alone. For the CAFA 2011 experiment, we further associated the FunFams with GO terms probabilistically. All target proteins were first submitted to domain superfamily assignment, followed by FunFam assignment and, eventually, function assignment. The latter included an integration step for multi-domain target proteins. The CAFA results put our domain-based approach among the top ten of 31 competing groups and 56 prediction methods, confirming that it outperforms simple pairwise whole-protein sequence comparisons.
17 schema:genre article
18 schema:inLanguage en
19 schema:isAccessibleForFree true
20 schema:isPartOf N46f0e525993c4edc84984bf471b9ff53
21 Na660463344e74a1ab1cb936803d9dea8
22 sg:journal.1023786
23 schema:keywords CAFA 2011 experiment
24 CAFA results
25 FunFam assignment
26 FunFams
27 GO annotations
28 GO terms
29 Gene Ontology (GO) annotation data
30 Ontology (GO) annotation data
31 annotation
32 annotation data
33 approach
34 assignment
35 available high-quality Gene Ontology (GO) annotation data
36 cluster evaluation
37 comparison
38 data
39 domain family
40 domain-based approach
41 enzyme family
42 enzyme sequences
43 essence
44 evaluation
45 experiments
46 family
47 function
48 function assignment
49 function prediction
50 functional families
51 group
52 high-quality Gene Ontology (GO) annotation data
53 initial test
54 integration step
55 latter step
56 method
57 multi-domain target proteins
58 pairwise whole-protein sequence comparisons
59 parent protein
60 prediction
61 prediction method
62 protein
63 protein function prediction
64 protocol
65 protocol group
66 results
67 same superfamily
68 sequence
69 sequence comparison
70 simple pairwise whole-protein sequence comparisons
71 step
72 superfamily
73 supervised cluster evaluation
74 target proteins
75 terms
76 test
77 use
78 whole protein
79 whole-protein sequence comparisons
80 schema:name Protein function prediction using domain families
81 schema:pagination s5-s5
82 schema:productId N114ba1d9d90d452cbe56f649b23cf2be
83 N6d4e1f00469242e7a5ab62e023c4d73f
84 Ne7ba519788f7402ab9c8dd999f87be5d
85 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037930554
86 https://doi.org/10.1186/1471-2105-14-s3-s5
87 schema:sdDatePublished 2021-12-01T19:29
88 schema:sdLicense https://scigraph.springernature.com/explorer/license/
89 schema:sdPublisher N3774e44a00bc461987b3fdd637d98f94
90 schema:url https://doi.org/10.1186/1471-2105-14-s3-s5
91 sgo:license sg:explorer/license/
92 sgo:sdDataset articles
93 rdf:type schema:ScholarlyArticle
94 N114ba1d9d90d452cbe56f649b23cf2be schema:name doi
95 schema:value 10.1186/1471-2105-14-s3-s5
96 rdf:type schema:PropertyValue
97 N2f9f5d7783e64ec38cf4656c7823131d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
98 schema:name Cluster Analysis
99 rdf:type schema:DefinedTerm
100 N3774e44a00bc461987b3fdd637d98f94 schema:name Springer Nature - SN SciGraph project
101 rdf:type schema:Organization
102 N46f0e525993c4edc84984bf471b9ff53 schema:volumeNumber 14
103 rdf:type schema:PublicationVolume
104 N64f7fb52199a4d5aafb789c1efa2fdcd rdf:first sg:person.01136244107.52
105 rdf:rest rdf:nil
106 N6d4e1f00469242e7a5ab62e023c4d73f schema:name dimensions_id
107 schema:value pub.1037930554
108 rdf:type schema:PropertyValue
109 N79222fbb4f7c4483a086a70eae2e4537 rdf:first sg:person.0711664756.15
110 rdf:rest N64f7fb52199a4d5aafb789c1efa2fdcd
111 N888916dd6d754131864bfff6ef0f029a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
112 schema:name Databases, Protein
113 rdf:type schema:DefinedTerm
114 N982ecf96279f4789b6614c96bfb43754 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
115 schema:name Molecular Sequence Annotation
116 rdf:type schema:DefinedTerm
117 Na660463344e74a1ab1cb936803d9dea8 schema:issueNumber Suppl 3
118 rdf:type schema:PublicationIssue
119 Nba785485c16d40e98a38d88612a7176a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
120 schema:name Vocabulary, Controlled
121 rdf:type schema:DefinedTerm
122 Nc641020ed6d945c9be3fddc2d2b31014 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
123 schema:name Sequence Analysis, Protein
124 rdf:type schema:DefinedTerm
125 Nc9b8bd17677345af9c326f366e83c04b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
126 schema:name Protein Structure, Tertiary
127 rdf:type schema:DefinedTerm
128 Ncbab0bf6f44142e9ba19e84250a054f1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
129 schema:name Proteins
130 rdf:type schema:DefinedTerm
131 Ne7ba519788f7402ab9c8dd999f87be5d schema:name pubmed_id
132 schema:value 23514456
133 rdf:type schema:PropertyValue
134 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
135 schema:name Biological Sciences
136 rdf:type schema:DefinedTerm
137 anzsrc-for:0601 schema:inDefinedTermSet anzsrc-for:
138 schema:name Biochemistry and Cell Biology
139 rdf:type schema:DefinedTerm
140 sg:grant.2783003 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-14-s3-s5
141 rdf:type schema:MonetaryGrant
142 sg:journal.1023786 schema:issn 1471-2105
143 schema:name BMC Bioinformatics
144 schema:publisher Springer Nature
145 rdf:type schema:Periodical
146 sg:person.01136244107.52 schema:affiliation grid-institutes:grid.509978.a
147 schema:familyName Orengo
148 schema:givenName Christine A
149 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01136244107.52
150 rdf:type schema:Person
151 sg:person.0711664756.15 schema:affiliation grid-institutes:grid.13652.33
152 schema:familyName Rentzsch
153 schema:givenName Robert
154 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0711664756.15
155 rdf:type schema:Person
156 sg:pub.10.1038/75556 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044135237
157 https://doi.org/10.1038/75556
158 rdf:type schema:CreativeWork
159 sg:pub.10.1186/1471-2105-5-178 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018640803
160 https://doi.org/10.1186/1471-2105-5-178
161 rdf:type schema:CreativeWork
162 sg:pub.10.1186/1471-2105-6-s1-s17 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051824257
163 https://doi.org/10.1186/1471-2105-6-s1-s17
164 rdf:type schema:CreativeWork
165 grid-institutes:grid.13652.33 schema:alternateName Robert Koch Institut, Research Group Bioinformatics Ng4, Nordufer 20, 13353 Berlin, Germany
166 schema:name Robert Koch Institut, Research Group Bioinformatics Ng4, Nordufer 20, 13353 Berlin, Germany
167 rdf:type schema:Organization
168 grid-institutes:grid.509978.a schema:alternateName Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
169 schema:name Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
170 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...