Towards Automatic Detecting of Overlapping Genes - Clustered BLAST Analysis of Viral Genomes View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2010

AUTHORS

Klaus Neuhaus , Daniela Oelke , David Fürst , Siegfried Scherer , Daniel A. Keim

ABSTRACT

Overlapping genes (encoded on the same DNA locus but in different frames) are thought to be rare and, therefore, were largely neglected in the past. In a test set of 800 viruses we found more than 350 potential overlapping open reading frames of >500 bp which generate BLAST hits, indicating a possible biological function. Interestingly, five overlaps with more than 2000 bp were found, the largest may even contain triple overlaps. In order to perform the vast amount of BLAST searches required to test all detected open reading frames, we compared two clustering strategies (BLASTCLUST and k-means) and queried the database with one representative only. Our results show that this approach achieves a significant speed-up while retaining a high quality of the results (>99% precision compared to single queries) for both clustering methods. Future wet lab experiments are needed to show whether the detected overlapping reading frames are biologically functional. More... »

PAGES

228-239

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-12211-8_20

DOI

http://dx.doi.org/10.1007/978-3-642-12211-8_20

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1052790687


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Chair of Microbial Ecology, Technische Universit\u00e4t M\u00fcnchen, Weihenstephaner Berg 3, 85354, Freising, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6936.a", 
          "name": [
            "Chair of Microbial Ecology, Technische Universit\u00e4t M\u00fcnchen, Weihenstephaner Berg 3, 85354, Freising, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Neuhaus", 
        "givenName": "Klaus", 
        "id": "sg:person.0767764126.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0767764126.02"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Chair of Data Analysis and Visualization, Universit\u00e4t Konstanz, Universit\u00e4tsstr. 10, 78457, Konstanz, Germany", 
          "id": "http://www.grid.ac/institutes/grid.9811.1", 
          "name": [
            "Chair of Data Analysis and Visualization, Universit\u00e4t Konstanz, Universit\u00e4tsstr. 10, 78457, Konstanz, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Oelke", 
        "givenName": "Daniela", 
        "id": "sg:person.07667765141.23", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07667765141.23"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Chair of Data Management and Data Exploration, Rheinisch-Westf\u00e4lische, Technische Hochschule Aachen, Informatik 9, 52056, Aachen, Germany", 
          "id": "http://www.grid.ac/institutes/grid.1957.a", 
          "name": [
            "Chair of Data Management and Data Exploration, Rheinisch-Westf\u00e4lische, Technische Hochschule Aachen, Informatik 9, 52056, Aachen, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "F\u00fcrst", 
        "givenName": "David", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Chair of Microbial Ecology, Technische Universit\u00e4t M\u00fcnchen, Weihenstephaner Berg 3, 85354, Freising, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6936.a", 
          "name": [
            "Chair of Microbial Ecology, Technische Universit\u00e4t M\u00fcnchen, Weihenstephaner Berg 3, 85354, Freising, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Scherer", 
        "givenName": "Siegfried", 
        "id": "sg:person.01167132061.21", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01167132061.21"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Chair of Data Analysis and Visualization, Universit\u00e4t Konstanz, Universit\u00e4tsstr. 10, 78457, Konstanz, Germany", 
          "id": "http://www.grid.ac/institutes/grid.9811.1", 
          "name": [
            "Chair of Data Analysis and Visualization, Universit\u00e4t Konstanz, Universit\u00e4tsstr. 10, 78457, Konstanz, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Keim", 
        "givenName": "Daniel A.", 
        "id": "sg:person.0635776571.01", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0635776571.01"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2010", 
    "datePublishedReg": "2010-01-01", 
    "description": "Overlapping genes (encoded on the same DNA locus but in different frames) are thought to be rare and, therefore, were largely neglected in the past. In a test set of 800 viruses we found more than 350 potential overlapping open reading frames of >500 bp which generate BLAST hits, indicating a possible biological function. Interestingly, five overlaps with more than 2000 bp were found, the largest may even contain triple overlaps. In order to perform the vast amount of BLAST searches required to test all detected open reading frames, we compared two clustering strategies (BLASTCLUST and k-means) and queried the database with one representative only. Our results show that this approach achieves a significant speed-up while retaining a high quality of the results (>99% precision compared to single queries) for both clustering methods. Future wet lab experiments are needed to show whether the detected overlapping reading frames are biologically functional.", 
    "editor": [
      {
        "familyName": "Pizzuti", 
        "givenName": "Clara", 
        "type": "Person"
      }, 
      {
        "familyName": "Ritchie", 
        "givenName": "Marylyn D.", 
        "type": "Person"
      }, 
      {
        "familyName": "Giacobini", 
        "givenName": "Mario", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-12211-8_20", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-12210-1", 
        "978-3-642-12211-8"
      ], 
      "name": "Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics", 
      "type": "Book"
    }, 
    "keywords": [
      "open reading frame", 
      "reading frame", 
      "possible biological functions", 
      "overlapping reading frame", 
      "future wet-lab experiments", 
      "BLAST hits", 
      "wet-lab experiments", 
      "BLAST search", 
      "biological functions", 
      "BLAST analysis", 
      "viral genome", 
      "BP", 
      "genome", 
      "genes", 
      "overlap", 
      "frame", 
      "hits", 
      "vast amount", 
      "virus", 
      "function", 
      "triple overlap", 
      "representatives", 
      "lab experiments", 
      "results", 
      "analysis", 
      "strategies", 
      "experiments", 
      "amount", 
      "database", 
      "test set", 
      "set", 
      "search", 
      "approach", 
      "high quality", 
      "past", 
      "order", 
      "quality", 
      "method", 
      "automatic detecting", 
      "detecting"
    ], 
    "name": "Towards Automatic Detecting of Overlapping Genes - Clustered BLAST Analysis of Viral Genomes", 
    "pagination": "228-239", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1052790687"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-12211-8_20"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-12211-8_20", 
      "https://app.dimensions.ai/details/publication/pub.1052790687"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-09-02T16:16", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220902/entities/gbq_results/chapter/chapter_406.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-12211-8_20"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12211-8_20'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12211-8_20'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12211-8_20'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12211-8_20'


 

This table displays all metadata directly associated to this object as RDF triples.

142 TRIPLES      22 PREDICATES      65 URIs      58 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-12211-8_20 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author N75ee882bb5b940ddb4dfe7d36b9132b9
4 schema:datePublished 2010
5 schema:datePublishedReg 2010-01-01
6 schema:description Overlapping genes (encoded on the same DNA locus but in different frames) are thought to be rare and, therefore, were largely neglected in the past. In a test set of 800 viruses we found more than 350 potential overlapping open reading frames of >500 bp which generate BLAST hits, indicating a possible biological function. Interestingly, five overlaps with more than 2000 bp were found, the largest may even contain triple overlaps. In order to perform the vast amount of BLAST searches required to test all detected open reading frames, we compared two clustering strategies (BLASTCLUST and k-means) and queried the database with one representative only. Our results show that this approach achieves a significant speed-up while retaining a high quality of the results (>99% precision compared to single queries) for both clustering methods. Future wet lab experiments are needed to show whether the detected overlapping reading frames are biologically functional.
7 schema:editor N365971d85c4f4a409f4cd09a25e39c35
8 schema:genre chapter
9 schema:isAccessibleForFree true
10 schema:isPartOf N9e9c42c884be4cd1b46277c96c853712
11 schema:keywords BLAST analysis
12 BLAST hits
13 BLAST search
14 BP
15 amount
16 analysis
17 approach
18 automatic detecting
19 biological functions
20 database
21 detecting
22 experiments
23 frame
24 function
25 future wet-lab experiments
26 genes
27 genome
28 high quality
29 hits
30 lab experiments
31 method
32 open reading frame
33 order
34 overlap
35 overlapping reading frame
36 past
37 possible biological functions
38 quality
39 reading frame
40 representatives
41 results
42 search
43 set
44 strategies
45 test set
46 triple overlap
47 vast amount
48 viral genome
49 virus
50 wet-lab experiments
51 schema:name Towards Automatic Detecting of Overlapping Genes - Clustered BLAST Analysis of Viral Genomes
52 schema:pagination 228-239
53 schema:productId N25e19ad57b52430ca5b7631a815b55c9
54 Nff139e73c7524c6c873b165bc2b9a19e
55 schema:publisher N460efac28eae4b428f1a7d611c0282c9
56 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052790687
57 https://doi.org/10.1007/978-3-642-12211-8_20
58 schema:sdDatePublished 2022-09-02T16:16
59 schema:sdLicense https://scigraph.springernature.com/explorer/license/
60 schema:sdPublisher N53cae55f6f194995a058151f5dfc0b78
61 schema:url https://doi.org/10.1007/978-3-642-12211-8_20
62 sgo:license sg:explorer/license/
63 sgo:sdDataset chapters
64 rdf:type schema:Chapter
65 N07cb2d40322a490f9b6121d73c62ae09 schema:familyName Giacobini
66 schema:givenName Mario
67 rdf:type schema:Person
68 N25e19ad57b52430ca5b7631a815b55c9 schema:name dimensions_id
69 schema:value pub.1052790687
70 rdf:type schema:PropertyValue
71 N27a5c37d6b434152af88e273cf4350e4 rdf:first Na05b04302ac547daba3760f89a376822
72 rdf:rest Nc0007794b0664361a5add248e72ec2cd
73 N365971d85c4f4a409f4cd09a25e39c35 rdf:first Nd20a046d07614f40b070f514ac29fbc4
74 rdf:rest Neede9174b49b4e83ba9e5005d40fa074
75 N460efac28eae4b428f1a7d611c0282c9 schema:name Springer Nature
76 rdf:type schema:Organisation
77 N484e7b183cce404db352ca17f9482dc5 rdf:first N07cb2d40322a490f9b6121d73c62ae09
78 rdf:rest rdf:nil
79 N4ea8af127d0e4bbfa22c30139246609b schema:familyName Ritchie
80 schema:givenName Marylyn D.
81 rdf:type schema:Person
82 N539d4143c20942b8bf50591817f8fa15 rdf:first sg:person.0635776571.01
83 rdf:rest rdf:nil
84 N53cae55f6f194995a058151f5dfc0b78 schema:name Springer Nature - SN SciGraph project
85 rdf:type schema:Organization
86 N61d011c3ec4a449e85712ba400095dc2 rdf:first sg:person.07667765141.23
87 rdf:rest N27a5c37d6b434152af88e273cf4350e4
88 N75ee882bb5b940ddb4dfe7d36b9132b9 rdf:first sg:person.0767764126.02
89 rdf:rest N61d011c3ec4a449e85712ba400095dc2
90 N9e9c42c884be4cd1b46277c96c853712 schema:isbn 978-3-642-12210-1
91 978-3-642-12211-8
92 schema:name Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
93 rdf:type schema:Book
94 Na05b04302ac547daba3760f89a376822 schema:affiliation grid-institutes:grid.1957.a
95 schema:familyName Fürst
96 schema:givenName David
97 rdf:type schema:Person
98 Nc0007794b0664361a5add248e72ec2cd rdf:first sg:person.01167132061.21
99 rdf:rest N539d4143c20942b8bf50591817f8fa15
100 Nd20a046d07614f40b070f514ac29fbc4 schema:familyName Pizzuti
101 schema:givenName Clara
102 rdf:type schema:Person
103 Neede9174b49b4e83ba9e5005d40fa074 rdf:first N4ea8af127d0e4bbfa22c30139246609b
104 rdf:rest N484e7b183cce404db352ca17f9482dc5
105 Nff139e73c7524c6c873b165bc2b9a19e schema:name doi
106 schema:value 10.1007/978-3-642-12211-8_20
107 rdf:type schema:PropertyValue
108 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
109 schema:name Biological Sciences
110 rdf:type schema:DefinedTerm
111 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
112 schema:name Genetics
113 rdf:type schema:DefinedTerm
114 sg:person.01167132061.21 schema:affiliation grid-institutes:grid.6936.a
115 schema:familyName Scherer
116 schema:givenName Siegfried
117 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01167132061.21
118 rdf:type schema:Person
119 sg:person.0635776571.01 schema:affiliation grid-institutes:grid.9811.1
120 schema:familyName Keim
121 schema:givenName Daniel A.
122 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0635776571.01
123 rdf:type schema:Person
124 sg:person.07667765141.23 schema:affiliation grid-institutes:grid.9811.1
125 schema:familyName Oelke
126 schema:givenName Daniela
127 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07667765141.23
128 rdf:type schema:Person
129 sg:person.0767764126.02 schema:affiliation grid-institutes:grid.6936.a
130 schema:familyName Neuhaus
131 schema:givenName Klaus
132 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0767764126.02
133 rdf:type schema:Person
134 grid-institutes:grid.1957.a schema:alternateName Chair of Data Management and Data Exploration, Rheinisch-Westfälische, Technische Hochschule Aachen, Informatik 9, 52056, Aachen, Germany
135 schema:name Chair of Data Management and Data Exploration, Rheinisch-Westfälische, Technische Hochschule Aachen, Informatik 9, 52056, Aachen, Germany
136 rdf:type schema:Organization
137 grid-institutes:grid.6936.a schema:alternateName Chair of Microbial Ecology, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
138 schema:name Chair of Microbial Ecology, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
139 rdf:type schema:Organization
140 grid-institutes:grid.9811.1 schema:alternateName Chair of Data Analysis and Visualization, Universität Konstanz, Universitätsstr. 10, 78457, Konstanz, Germany
141 schema:name Chair of Data Analysis and Visualization, Universität Konstanz, Universitätsstr. 10, 78457, Konstanz, Germany
142 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...