Back-translation for discovering distant protein homologies in the presence of frameshift mutations View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2010-01-04

AUTHORS

Marta Gîrdea, Laurent Noé, Gregory Kucherov

ABSTRACT

BackgroundFrameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level.ResultsWe developed a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. Our implementation is freely available at http://bioinfo.lifl.fr/path/.ConclusionsOur approach allows to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples. More... »

PAGES

6

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1748-7188-5-6

DOI

http://dx.doi.org/10.1186/1748-7188-5-6

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1013342982

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/20047662


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Institut National de Recherche en Informatique et en Automatique, Centre de Recherche Lille - Nord Europe, France", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Laboratoire d'Informatique Fondamentale de Lille, (Centre National de la Recherche Scientifique, Universit\u00e9 Lille 1), Lille, France", 
            "Institut National de Recherche en Informatique et en Automatique, Centre de Recherche Lille - Nord Europe, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "G\u00eerdea", 
        "givenName": "Marta", 
        "id": "sg:person.01203454254.70", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01203454254.70"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institut National de Recherche en Informatique et en Automatique, Centre de Recherche Lille - Nord Europe, France", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Laboratoire d'Informatique Fondamentale de Lille, (Centre National de la Recherche Scientifique, Universit\u00e9 Lille 1), Lille, France", 
            "Institut National de Recherche en Informatique et en Automatique, Centre de Recherche Lille - Nord Europe, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "No\u00e9", 
        "givenName": "Laurent", 
        "id": "sg:person.01233236632.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01233236632.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "French-Russian J-V Poncelet Laboratory, Moscow, Russia", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Laboratoire d'Informatique Fondamentale de Lille, (Centre National de la Recherche Scientifique, Universit\u00e9 Lille 1), Lille, France", 
            "Institut National de Recherche en Informatique et en Automatique, Centre de Recherche Lille - Nord Europe, France", 
            "French-Russian J-V Poncelet Laboratory, Moscow, Russia"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kucherov", 
        "givenName": "Gregory", 
        "id": "sg:person.013163701366.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013163701366.40"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1186/1471-2105-6-156", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023271915", 
          "https://doi.org/10.1186/1471-2105-6-156"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-04241-6_10", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030953214", 
          "https://doi.org/10.1007/978-3-642-04241-6_10"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-6-134", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033106693", 
          "https://doi.org/10.1186/1471-2105-6-134"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00239-004-0138-0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022873889", 
          "https://doi.org/10.1007/s00239-004-0138-0"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-8-371", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050226563", 
          "https://doi.org/10.1186/1471-2164-8-371"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-04241-6_20", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019087962", 
          "https://doi.org/10.1007/978-3-642-04241-6_20"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-63220-4_59", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053403180", 
          "https://doi.org/10.1007/3-540-63220-4_59"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00162968", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013701222", 
          "https://doi.org/10.1007/bf00162968"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2010-01-04", 
    "datePublishedReg": "2010-01-04", 
    "description": "BackgroundFrameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level.ResultsWe developed a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. Our implementation is freely available at http://bioinfo.lifl.fr/path/.ConclusionsOur approach allows to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1748-7188-5-6", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1036449", 
        "issn": [
          "1748-7188"
        ], 
        "name": "Algorithms for Molecular Biology", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "5"
      }
    ], 
    "keywords": [
      "DNA sequences", 
      "protein-coding DNA sequences", 
      "dynamic programming alignment algorithms", 
      "alignment method", 
      "traditional alignment methods", 
      "common origin", 
      "protein homology", 
      "evolutionary information", 
      "protein sequences", 
      "alignment algorithm", 
      "evolutionary processes", 
      "graph representation", 
      "powerful scoring system", 
      "homology relations", 
      "homology detection", 
      "point mutations", 
      "frameshift mutation", 
      "mutations", 
      "sequence", 
      "ConclusionsOur approach", 
      "protein", 
      "DNA levels", 
      "best scoring", 
      "novel method", 
      "large number", 
      "homology", 
      "frameshift", 
      "algorithm", 
      "divergence", 
      "significant examples", 
      "implementation", 
      "drastic changes", 
      "complete set", 
      "representation", 
      "information", 
      "method", 
      "set", 
      "detection", 
      "substitution", 
      "system", 
      "goal", 
      "origin", 
      "example", 
      "scoring", 
      "presence", 
      "levels", 
      "changes", 
      "number", 
      "process", 
      "ResultsWe", 
      "approach", 
      "scoring system", 
      "relation"
    ], 
    "name": "Back-translation for discovering distant protein homologies in the presence of frameshift mutations", 
    "pagination": "6", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1013342982"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1748-7188-5-6"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "20047662"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1748-7188-5-6", 
      "https://app.dimensions.ai/details/publication/pub.1013342982"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-05-20T07:26", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/article/article_510.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1748-7188-5-6"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-5-6'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-5-6'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-5-6'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-5-6'


 

This table displays all metadata directly associated to this object as RDF triples.

163 TRIPLES      22 PREDICATES      87 URIs      71 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1748-7188-5-6 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author Nc3f0d8b20b8f478e846043bb36ddb074
4 schema:citation sg:pub.10.1007/3-540-63220-4_59
5 sg:pub.10.1007/978-3-642-04241-6_10
6 sg:pub.10.1007/978-3-642-04241-6_20
7 sg:pub.10.1007/bf00162968
8 sg:pub.10.1007/s00239-004-0138-0
9 sg:pub.10.1186/1471-2105-6-134
10 sg:pub.10.1186/1471-2105-6-156
11 sg:pub.10.1186/1471-2164-8-371
12 schema:datePublished 2010-01-04
13 schema:datePublishedReg 2010-01-04
14 schema:description BackgroundFrameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level.ResultsWe developed a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. Our implementation is freely available at http://bioinfo.lifl.fr/path/.ConclusionsOur approach allows to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.
15 schema:genre article
16 schema:inLanguage en
17 schema:isAccessibleForFree true
18 schema:isPartOf N39f7fb4c38854cdfa8e2e7f870cc8d18
19 Nece83dae16ca481684b902a2cb875313
20 sg:journal.1036449
21 schema:keywords ConclusionsOur approach
22 DNA levels
23 DNA sequences
24 ResultsWe
25 algorithm
26 alignment algorithm
27 alignment method
28 approach
29 best scoring
30 changes
31 common origin
32 complete set
33 detection
34 divergence
35 drastic changes
36 dynamic programming alignment algorithms
37 evolutionary information
38 evolutionary processes
39 example
40 frameshift
41 frameshift mutation
42 goal
43 graph representation
44 homology
45 homology detection
46 homology relations
47 implementation
48 information
49 large number
50 levels
51 method
52 mutations
53 novel method
54 number
55 origin
56 point mutations
57 powerful scoring system
58 presence
59 process
60 protein
61 protein homology
62 protein sequences
63 protein-coding DNA sequences
64 relation
65 representation
66 scoring
67 scoring system
68 sequence
69 set
70 significant examples
71 substitution
72 system
73 traditional alignment methods
74 schema:name Back-translation for discovering distant protein homologies in the presence of frameshift mutations
75 schema:pagination 6
76 schema:productId N214c45c31796478495e83f41a6396a69
77 Nd8391e712bd5405c83b1913824ae7100
78 Neaa104a66e014999a5647cf8240c765f
79 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013342982
80 https://doi.org/10.1186/1748-7188-5-6
81 schema:sdDatePublished 2022-05-20T07:26
82 schema:sdLicense https://scigraph.springernature.com/explorer/license/
83 schema:sdPublisher N11b1b6700a684d4cabf508c51051bd3c
84 schema:url https://doi.org/10.1186/1748-7188-5-6
85 sgo:license sg:explorer/license/
86 sgo:sdDataset articles
87 rdf:type schema:ScholarlyArticle
88 N11b1b6700a684d4cabf508c51051bd3c schema:name Springer Nature - SN SciGraph project
89 rdf:type schema:Organization
90 N214c45c31796478495e83f41a6396a69 schema:name doi
91 schema:value 10.1186/1748-7188-5-6
92 rdf:type schema:PropertyValue
93 N39f7fb4c38854cdfa8e2e7f870cc8d18 schema:issueNumber 1
94 rdf:type schema:PublicationIssue
95 N751f555b3b4a4f399cc3373282b623fa rdf:first sg:person.01233236632.20
96 rdf:rest Nb7b5909abbc24baaa1e219432b1c5dff
97 Nb7b5909abbc24baaa1e219432b1c5dff rdf:first sg:person.013163701366.40
98 rdf:rest rdf:nil
99 Nc3f0d8b20b8f478e846043bb36ddb074 rdf:first sg:person.01203454254.70
100 rdf:rest N751f555b3b4a4f399cc3373282b623fa
101 Nd8391e712bd5405c83b1913824ae7100 schema:name dimensions_id
102 schema:value pub.1013342982
103 rdf:type schema:PropertyValue
104 Neaa104a66e014999a5647cf8240c765f schema:name pubmed_id
105 schema:value 20047662
106 rdf:type schema:PropertyValue
107 Nece83dae16ca481684b902a2cb875313 schema:volumeNumber 5
108 rdf:type schema:PublicationVolume
109 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
110 schema:name Biological Sciences
111 rdf:type schema:DefinedTerm
112 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
113 schema:name Genetics
114 rdf:type schema:DefinedTerm
115 sg:journal.1036449 schema:issn 1748-7188
116 schema:name Algorithms for Molecular Biology
117 schema:publisher Springer Nature
118 rdf:type schema:Periodical
119 sg:person.01203454254.70 schema:affiliation grid-institutes:None
120 schema:familyName Gîrdea
121 schema:givenName Marta
122 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01203454254.70
123 rdf:type schema:Person
124 sg:person.01233236632.20 schema:affiliation grid-institutes:None
125 schema:familyName Noé
126 schema:givenName Laurent
127 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01233236632.20
128 rdf:type schema:Person
129 sg:person.013163701366.40 schema:affiliation grid-institutes:None
130 schema:familyName Kucherov
131 schema:givenName Gregory
132 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013163701366.40
133 rdf:type schema:Person
134 sg:pub.10.1007/3-540-63220-4_59 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053403180
135 https://doi.org/10.1007/3-540-63220-4_59
136 rdf:type schema:CreativeWork
137 sg:pub.10.1007/978-3-642-04241-6_10 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030953214
138 https://doi.org/10.1007/978-3-642-04241-6_10
139 rdf:type schema:CreativeWork
140 sg:pub.10.1007/978-3-642-04241-6_20 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019087962
141 https://doi.org/10.1007/978-3-642-04241-6_20
142 rdf:type schema:CreativeWork
143 sg:pub.10.1007/bf00162968 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013701222
144 https://doi.org/10.1007/bf00162968
145 rdf:type schema:CreativeWork
146 sg:pub.10.1007/s00239-004-0138-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022873889
147 https://doi.org/10.1007/s00239-004-0138-0
148 rdf:type schema:CreativeWork
149 sg:pub.10.1186/1471-2105-6-134 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033106693
150 https://doi.org/10.1186/1471-2105-6-134
151 rdf:type schema:CreativeWork
152 sg:pub.10.1186/1471-2105-6-156 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023271915
153 https://doi.org/10.1186/1471-2105-6-156
154 rdf:type schema:CreativeWork
155 sg:pub.10.1186/1471-2164-8-371 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050226563
156 https://doi.org/10.1186/1471-2164-8-371
157 rdf:type schema:CreativeWork
158 grid-institutes:None schema:alternateName French-Russian J-V Poncelet Laboratory, Moscow, Russia
159 Institut National de Recherche en Informatique et en Automatique, Centre de Recherche Lille - Nord Europe, France
160 schema:name French-Russian J-V Poncelet Laboratory, Moscow, Russia
161 Institut National de Recherche en Informatique et en Automatique, Centre de Recherche Lille - Nord Europe, France
162 Laboratoire d'Informatique Fondamentale de Lille, (Centre National de la Recherche Scientifique, Université Lille 1), Lille, France
163 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...