Back-Translation for Discovering Distant Protein Homologies View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2009

AUTHORS

Marta Gîrdea , Laurent Noé , Gregory Kucherov

ABSTRACT

Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples. More... »

PAGES

108-120

Book

TITLE

Algorithms in Bioinformatics

ISBN

978-3-642-04240-9
978-3-642-04241-6

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-04241-6_10

DOI

http://dx.doi.org/10.1007/978-3-642-04241-6_10

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1030953214


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "INRIA Lille - Nord Europe, LIFL/CNRS, Universit\u00e9 Lille 1, 59655, Villeneuve d\u2019Ascq, France", 
          "id": "http://www.grid.ac/institutes/grid.457352.2", 
          "name": [
            "INRIA Lille - Nord Europe, LIFL/CNRS, Universit\u00e9 Lille 1, 59655, Villeneuve d\u2019Ascq, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "G\u00eerdea", 
        "givenName": "Marta", 
        "id": "sg:person.01203454254.70", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01203454254.70"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "INRIA Lille - Nord Europe, LIFL/CNRS, Universit\u00e9 Lille 1, 59655, Villeneuve d\u2019Ascq, France", 
          "id": "http://www.grid.ac/institutes/grid.457352.2", 
          "name": [
            "INRIA Lille - Nord Europe, LIFL/CNRS, Universit\u00e9 Lille 1, 59655, Villeneuve d\u2019Ascq, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "No\u00e9", 
        "givenName": "Laurent", 
        "id": "sg:person.01233236632.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01233236632.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "INRIA Lille - Nord Europe, LIFL/CNRS, Universit\u00e9 Lille 1, 59655, Villeneuve d\u2019Ascq, France", 
          "id": "http://www.grid.ac/institutes/grid.457352.2", 
          "name": [
            "INRIA Lille - Nord Europe, LIFL/CNRS, Universit\u00e9 Lille 1, 59655, Villeneuve d\u2019Ascq, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kucherov", 
        "givenName": "Gregory", 
        "id": "sg:person.013163701366.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013163701366.40"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2009", 
    "datePublishedReg": "2009-01-01", 
    "description": "Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins\u2019 common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.", 
    "editor": [
      {
        "familyName": "Salzberg", 
        "givenName": "Steven L.", 
        "type": "Person"
      }, 
      {
        "familyName": "Warnow", 
        "givenName": "Tandy", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-04241-6_10", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-04240-9", 
        "978-3-642-04241-6"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "keywords": [
      "DNA sequences", 
      "protein-coding DNA sequences", 
      "dynamic programming alignment algorithms", 
      "alignment method", 
      "traditional alignment methods", 
      "protein homology", 
      "protein sequences", 
      "evolutionary information", 
      "evolutionary processes", 
      "homology relations", 
      "alignment algorithm", 
      "graph representation", 
      "point mutations", 
      "powerful scoring system", 
      "frameshift mutation", 
      "homology detection", 
      "protein", 
      "sequence", 
      "common origin", 
      "DNA levels", 
      "mutations", 
      "best scoring", 
      "novel method", 
      "homology", 
      "large number", 
      "frameshift", 
      "divergence", 
      "algorithm", 
      "significant examples", 
      "drastic changes", 
      "complete set", 
      "representation", 
      "information", 
      "set", 
      "method", 
      "substitution", 
      "detection", 
      "origin", 
      "system", 
      "goal", 
      "example", 
      "situation", 
      "levels", 
      "scoring", 
      "changes", 
      "number", 
      "process", 
      "back translation", 
      "relation", 
      "scoring system"
    ], 
    "name": "Back-Translation for Discovering Distant Protein Homologies", 
    "pagination": "108-120", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1030953214"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-04241-6_10"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-04241-6_10", 
      "https://app.dimensions.ai/details/publication/pub.1030953214"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-05-20T07:45", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/chapter/chapter_296.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-04241-6_10"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04241-6_10'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04241-6_10'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04241-6_10'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04241-6_10'


 

This table displays all metadata directly associated to this object as RDF triples.

129 TRIPLES      23 PREDICATES      76 URIs      69 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-04241-6_10 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author Nc09a21259b7b48aa8e52f6f5e3bac3e3
4 schema:datePublished 2009
5 schema:datePublishedReg 2009-01-01
6 schema:description Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.
7 schema:editor N9f8207c6dd4f48d795dadeb3addeaf05
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf N5505055443e543e5ad29dd709bdde4eb
12 schema:keywords DNA levels
13 DNA sequences
14 algorithm
15 alignment algorithm
16 alignment method
17 back translation
18 best scoring
19 changes
20 common origin
21 complete set
22 detection
23 divergence
24 drastic changes
25 dynamic programming alignment algorithms
26 evolutionary information
27 evolutionary processes
28 example
29 frameshift
30 frameshift mutation
31 goal
32 graph representation
33 homology
34 homology detection
35 homology relations
36 information
37 large number
38 levels
39 method
40 mutations
41 novel method
42 number
43 origin
44 point mutations
45 powerful scoring system
46 process
47 protein
48 protein homology
49 protein sequences
50 protein-coding DNA sequences
51 relation
52 representation
53 scoring
54 scoring system
55 sequence
56 set
57 significant examples
58 situation
59 substitution
60 system
61 traditional alignment methods
62 schema:name Back-Translation for Discovering Distant Protein Homologies
63 schema:pagination 108-120
64 schema:productId N304d288a5c04437c9a204efd2e4d44e9
65 N44431493c58149809b827149337c0bb3
66 schema:publisher N800980ee255e4cf287f734e8f3c397bb
67 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030953214
68 https://doi.org/10.1007/978-3-642-04241-6_10
69 schema:sdDatePublished 2022-05-20T07:45
70 schema:sdLicense https://scigraph.springernature.com/explorer/license/
71 schema:sdPublisher Nc4e2ee1b77584060a45bdf265466e09c
72 schema:url https://doi.org/10.1007/978-3-642-04241-6_10
73 sgo:license sg:explorer/license/
74 sgo:sdDataset chapters
75 rdf:type schema:Chapter
76 N304d288a5c04437c9a204efd2e4d44e9 schema:name doi
77 schema:value 10.1007/978-3-642-04241-6_10
78 rdf:type schema:PropertyValue
79 N44431493c58149809b827149337c0bb3 schema:name dimensions_id
80 schema:value pub.1030953214
81 rdf:type schema:PropertyValue
82 N5505055443e543e5ad29dd709bdde4eb schema:isbn 978-3-642-04240-9
83 978-3-642-04241-6
84 schema:name Algorithms in Bioinformatics
85 rdf:type schema:Book
86 N5cc1217b82204796a0d55a064345783e rdf:first sg:person.01233236632.20
87 rdf:rest Ne21f891e12c24786889a820f72eb668e
88 N800980ee255e4cf287f734e8f3c397bb schema:name Springer Nature
89 rdf:type schema:Organisation
90 N9f8207c6dd4f48d795dadeb3addeaf05 rdf:first Nce0cc72807e74900b58c1d9a1d4c335d
91 rdf:rest Nca1485af346e49cca9399c438eeafe77
92 Na3e7e8dd6e684ee581f9f7511850c2fd schema:familyName Warnow
93 schema:givenName Tandy
94 rdf:type schema:Person
95 Nc09a21259b7b48aa8e52f6f5e3bac3e3 rdf:first sg:person.01203454254.70
96 rdf:rest N5cc1217b82204796a0d55a064345783e
97 Nc4e2ee1b77584060a45bdf265466e09c schema:name Springer Nature - SN SciGraph project
98 rdf:type schema:Organization
99 Nca1485af346e49cca9399c438eeafe77 rdf:first Na3e7e8dd6e684ee581f9f7511850c2fd
100 rdf:rest rdf:nil
101 Nce0cc72807e74900b58c1d9a1d4c335d schema:familyName Salzberg
102 schema:givenName Steven L.
103 rdf:type schema:Person
104 Ne21f891e12c24786889a820f72eb668e rdf:first sg:person.013163701366.40
105 rdf:rest rdf:nil
106 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
107 schema:name Biological Sciences
108 rdf:type schema:DefinedTerm
109 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
110 schema:name Genetics
111 rdf:type schema:DefinedTerm
112 sg:person.01203454254.70 schema:affiliation grid-institutes:grid.457352.2
113 schema:familyName Gîrdea
114 schema:givenName Marta
115 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01203454254.70
116 rdf:type schema:Person
117 sg:person.01233236632.20 schema:affiliation grid-institutes:grid.457352.2
118 schema:familyName Noé
119 schema:givenName Laurent
120 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01233236632.20
121 rdf:type schema:Person
122 sg:person.013163701366.40 schema:affiliation grid-institutes:grid.457352.2
123 schema:familyName Kucherov
124 schema:givenName Gregory
125 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013163701366.40
126 rdf:type schema:Person
127 grid-institutes:grid.457352.2 schema:alternateName INRIA Lille - Nord Europe, LIFL/CNRS, Université Lille 1, 59655, Villeneuve d’Ascq, France
128 schema:name INRIA Lille - Nord Europe, LIFL/CNRS, Université Lille 1, 59655, Villeneuve d’Ascq, France
129 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...