To Detect and Analyze Sequence Repeats Whatever Be Their Origin View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2012-01-31

AUTHORS

Jacques Nicolas

ABSTRACT

The development of numerous programs for the identification of mobile elements raises the issue of the founding concepts that are shared in their design. This is necessary for at least three reasons. First, the cost of designing, developing, debugging, and maintaining software could present a danger of distracting biologists from their main bioanalysis tasks that require a lot of energy. Some key concepts on exact repeats are always underlying the search for genomic repeats and we recall the most important ones. All along the chapter, we try to select practical tools that may help the design of new identification pipelines. Second, the huge increase of sequence production capacities requires to use the most efficient data structures and algorithms to scale up tools in front of the data deluge. This paper provides an up-to-date glimpse on the art of string indexing and string matching. Third, there exists a growing knowledge on the architecture of mobile elements built from literature and the analysis of results generated by these pipelines. Besides data management which has led to the discovery of new families or new elements of a family, the community has an increasing need in knowledge management tools in order to compare, validate, or simply keep trace of mobile element models. We end the paper with first considerations on what could help the near future of such research on models. More... »

PAGES

69-90

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-1-61779-603-6_4

DOI

http://dx.doi.org/10.1007/978-1-61779-603-6_4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1005818254

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/22367866


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/03", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Chemical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0399", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Other Chemical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0601", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biochemistry and Cell Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Animals", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Base Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computer Graphics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Models, Genetic", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Molecular Sequence Annotation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Repetitive Sequences, Nucleic Acid", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Homology, Nucleic Acid", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "IRISA, INRIA centre de recherche Rennes-Bretagne Atlantique, Campus Universitaire de Beaulieu, Rennes Cedex, France", 
          "id": "http://www.grid.ac/institutes/grid.420225.3", 
          "name": [
            "IRISA, INRIA centre de recherche Rennes-Bretagne Atlantique, Campus Universitaire de Beaulieu, Rennes Cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nicolas", 
        "givenName": "Jacques", 
        "id": "sg:person.01143715001.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01143715001.20"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2012-01-31", 
    "datePublishedReg": "2012-01-31", 
    "description": "The development of numerous programs for the identification of mobile elements raises the issue of the founding concepts that are shared in their design. This is necessary for at least three reasons. First, the cost of designing, developing, debugging, and maintaining software could present a danger of distracting biologists from their main bioanalysis tasks that require a lot of energy. Some key concepts on exact repeats are always underlying the search for genomic repeats and we recall the most important ones. All along the chapter, we try to select practical tools that may help the design of new identification pipelines. Second, the huge increase of sequence production capacities requires to use the most efficient data structures and algorithms to scale up tools in front of the data deluge. This paper provides an up-to-date glimpse on the art of string indexing and string matching. Third, there exists a growing knowledge on the architecture of mobile elements built from literature and the analysis of results generated by these pipelines. Besides data management which has led to the discovery of new families or new elements of a family, the community has an increasing need in knowledge management tools in order to compare, validate, or simply keep trace of mobile element models. We end the paper with first considerations on what could help the near future of such research on models.", 
    "editor": [
      {
        "familyName": "Bigot", 
        "givenName": "Yves", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-1-61779-603-6_4", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-1-61779-602-9", 
        "978-1-61779-603-6"
      ], 
      "name": "Mobile Genetic Elements", 
      "type": "Book"
    }, 
    "keywords": [
      "efficient data structure", 
      "knowledge management tools", 
      "data management", 
      "data deluge", 
      "data structure", 
      "string matching", 
      "string indexing", 
      "management tool", 
      "exact repeats", 
      "identification pipeline", 
      "huge increase", 
      "indexing", 
      "pipeline", 
      "tool", 
      "key concepts", 
      "practical tool", 
      "software", 
      "algorithm", 
      "architecture", 
      "deluge", 
      "genomic repeats", 
      "matching", 
      "important one", 
      "task", 
      "design", 
      "concept", 
      "new elements", 
      "analysis of results", 
      "near future", 
      "numerous programs", 
      "search", 
      "art", 
      "model", 
      "cost", 
      "traces", 
      "issues", 
      "mobile elements", 
      "biologists", 
      "elements", 
      "knowledge", 
      "discovery", 
      "order", 
      "management", 
      "need", 
      "first consideration", 
      "such research", 
      "one", 
      "research", 
      "program", 
      "identification", 
      "community", 
      "future", 
      "results", 
      "consideration", 
      "development", 
      "chapter", 
      "new family", 
      "reasons", 
      "literature", 
      "front", 
      "structure", 
      "analysis", 
      "danger", 
      "glimpse", 
      "production capacity", 
      "capacity", 
      "energy", 
      "element model", 
      "family", 
      "increase", 
      "repeats", 
      "origin", 
      "sequence repeats", 
      "paper", 
      "main bioanalysis tasks", 
      "bioanalysis tasks", 
      "new identification pipelines", 
      "sequence production capacities", 
      "date glimpse", 
      "mobile element models"
    ], 
    "name": "To Detect and Analyze Sequence Repeats Whatever Be Their Origin", 
    "pagination": "69-90", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1005818254"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-1-61779-603-6_4"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "22367866"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-1-61779-603-6_4", 
      "https://app.dimensions.ai/details/publication/pub.1005818254"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2021-12-01T20:04", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211201/entities/gbq_results/chapter/chapter_319.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-1-61779-603-6_4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-1-61779-603-6_4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-1-61779-603-6_4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-1-61779-603-6_4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-1-61779-603-6_4'


 

This table displays all metadata directly associated to this object as RDF triples.

192 TRIPLES      23 PREDICATES      117 URIs      108 LITERALS      18 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-1-61779-603-6_4 schema:about N089e54e5ba1641c1abed2d461d3acb98
2 N30f22f9d73a8412e970610715475a3a1
3 N3796fc72be924f61a868c778b5dbd567
4 N422b8236c21744019f270edaf9ce8d63
5 N460ae75fdf49462e9ba1837e9b326d50
6 N4ed9f1753bee48d0bd3655f38f6ff70a
7 N5db2f44097b44c308f42822a4877a537
8 N9d487717571b46b3aa17a347adba49b4
9 Nbedb0833d5284231b4d8b0a270ede6e4
10 Nc91de5d18a6446049f3a2063718a743a
11 anzsrc-for:03
12 anzsrc-for:0399
13 anzsrc-for:06
14 anzsrc-for:0601
15 schema:author N66e4207ed93d4eccb27cd8e2db2989cd
16 schema:datePublished 2012-01-31
17 schema:datePublishedReg 2012-01-31
18 schema:description The development of numerous programs for the identification of mobile elements raises the issue of the founding concepts that are shared in their design. This is necessary for at least three reasons. First, the cost of designing, developing, debugging, and maintaining software could present a danger of distracting biologists from their main bioanalysis tasks that require a lot of energy. Some key concepts on exact repeats are always underlying the search for genomic repeats and we recall the most important ones. All along the chapter, we try to select practical tools that may help the design of new identification pipelines. Second, the huge increase of sequence production capacities requires to use the most efficient data structures and algorithms to scale up tools in front of the data deluge. This paper provides an up-to-date glimpse on the art of string indexing and string matching. Third, there exists a growing knowledge on the architecture of mobile elements built from literature and the analysis of results generated by these pipelines. Besides data management which has led to the discovery of new families or new elements of a family, the community has an increasing need in knowledge management tools in order to compare, validate, or simply keep trace of mobile element models. We end the paper with first considerations on what could help the near future of such research on models.
19 schema:editor N36a4f1487056416ab254e62bbfe7c8af
20 schema:genre chapter
21 schema:inLanguage en
22 schema:isAccessibleForFree true
23 schema:isPartOf N080e5127da72421c83c942f8c90ffd23
24 schema:keywords algorithm
25 analysis
26 analysis of results
27 architecture
28 art
29 bioanalysis tasks
30 biologists
31 capacity
32 chapter
33 community
34 concept
35 consideration
36 cost
37 danger
38 data deluge
39 data management
40 data structure
41 date glimpse
42 deluge
43 design
44 development
45 discovery
46 efficient data structure
47 element model
48 elements
49 energy
50 exact repeats
51 family
52 first consideration
53 front
54 future
55 genomic repeats
56 glimpse
57 huge increase
58 identification
59 identification pipeline
60 important one
61 increase
62 indexing
63 issues
64 key concepts
65 knowledge
66 knowledge management tools
67 literature
68 main bioanalysis tasks
69 management
70 management tool
71 matching
72 mobile element models
73 mobile elements
74 model
75 near future
76 need
77 new elements
78 new family
79 new identification pipelines
80 numerous programs
81 one
82 order
83 origin
84 paper
85 pipeline
86 practical tool
87 production capacity
88 program
89 reasons
90 repeats
91 research
92 results
93 search
94 sequence production capacities
95 sequence repeats
96 software
97 string indexing
98 string matching
99 structure
100 such research
101 task
102 tool
103 traces
104 schema:name To Detect and Analyze Sequence Repeats Whatever Be Their Origin
105 schema:pagination 69-90
106 schema:productId N148ed33bfc7f46b8b9cda5c537a55360
107 N9cc8f8c2dcb4421480010994e533c8dc
108 Nbcd6dc6340c04544b171366f8fe2b962
109 schema:publisher Nbb5f50348f854951b3173fd1e64b5e05
110 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005818254
111 https://doi.org/10.1007/978-1-61779-603-6_4
112 schema:sdDatePublished 2021-12-01T20:04
113 schema:sdLicense https://scigraph.springernature.com/explorer/license/
114 schema:sdPublisher N17c1008fbb02404bb6f88574dd082b5c
115 schema:url https://doi.org/10.1007/978-1-61779-603-6_4
116 sgo:license sg:explorer/license/
117 sgo:sdDataset chapters
118 rdf:type schema:Chapter
119 N080e5127da72421c83c942f8c90ffd23 schema:isbn 978-1-61779-602-9
120 978-1-61779-603-6
121 schema:name Mobile Genetic Elements
122 rdf:type schema:Book
123 N089e54e5ba1641c1abed2d461d3acb98 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
124 schema:name Sequence Homology, Nucleic Acid
125 rdf:type schema:DefinedTerm
126 N148ed33bfc7f46b8b9cda5c537a55360 schema:name pubmed_id
127 schema:value 22367866
128 rdf:type schema:PropertyValue
129 N17c1008fbb02404bb6f88574dd082b5c schema:name Springer Nature - SN SciGraph project
130 rdf:type schema:Organization
131 N30f22f9d73a8412e970610715475a3a1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
132 schema:name Base Sequence
133 rdf:type schema:DefinedTerm
134 N36a4f1487056416ab254e62bbfe7c8af rdf:first Nec4d4d37ff4b4429b42b605ad327a265
135 rdf:rest rdf:nil
136 N3796fc72be924f61a868c778b5dbd567 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
137 schema:name Humans
138 rdf:type schema:DefinedTerm
139 N422b8236c21744019f270edaf9ce8d63 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
140 schema:name Computer Graphics
141 rdf:type schema:DefinedTerm
142 N460ae75fdf49462e9ba1837e9b326d50 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
143 schema:name Software
144 rdf:type schema:DefinedTerm
145 N4ed9f1753bee48d0bd3655f38f6ff70a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
146 schema:name Molecular Sequence Annotation
147 rdf:type schema:DefinedTerm
148 N5db2f44097b44c308f42822a4877a537 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
149 schema:name Repetitive Sequences, Nucleic Acid
150 rdf:type schema:DefinedTerm
151 N66e4207ed93d4eccb27cd8e2db2989cd rdf:first sg:person.01143715001.20
152 rdf:rest rdf:nil
153 N9cc8f8c2dcb4421480010994e533c8dc schema:name dimensions_id
154 schema:value pub.1005818254
155 rdf:type schema:PropertyValue
156 N9d487717571b46b3aa17a347adba49b4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
157 schema:name Animals
158 rdf:type schema:DefinedTerm
159 Nbb5f50348f854951b3173fd1e64b5e05 schema:name Springer Nature
160 rdf:type schema:Organisation
161 Nbcd6dc6340c04544b171366f8fe2b962 schema:name doi
162 schema:value 10.1007/978-1-61779-603-6_4
163 rdf:type schema:PropertyValue
164 Nbedb0833d5284231b4d8b0a270ede6e4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
165 schema:name Models, Genetic
166 rdf:type schema:DefinedTerm
167 Nc91de5d18a6446049f3a2063718a743a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
168 schema:name Sequence Analysis, DNA
169 rdf:type schema:DefinedTerm
170 Nec4d4d37ff4b4429b42b605ad327a265 schema:familyName Bigot
171 schema:givenName Yves
172 rdf:type schema:Person
173 anzsrc-for:03 schema:inDefinedTermSet anzsrc-for:
174 schema:name Chemical Sciences
175 rdf:type schema:DefinedTerm
176 anzsrc-for:0399 schema:inDefinedTermSet anzsrc-for:
177 schema:name Other Chemical Sciences
178 rdf:type schema:DefinedTerm
179 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
180 schema:name Biological Sciences
181 rdf:type schema:DefinedTerm
182 anzsrc-for:0601 schema:inDefinedTermSet anzsrc-for:
183 schema:name Biochemistry and Cell Biology
184 rdf:type schema:DefinedTerm
185 sg:person.01143715001.20 schema:affiliation grid-institutes:grid.420225.3
186 schema:familyName Nicolas
187 schema:givenName Jacques
188 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01143715001.20
189 rdf:type schema:Person
190 grid-institutes:grid.420225.3 schema:alternateName IRISA, INRIA centre de recherche Rennes-Bretagne Atlantique, Campus Universitaire de Beaulieu, Rennes Cedex, France
191 schema:name IRISA, INRIA centre de recherche Rennes-Bretagne Atlantique, Campus Universitaire de Beaulieu, Rennes Cedex, France
192 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...