Finding and Characterizing Repeats in Plant Genomes View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2016

AUTHORS

Jacques Nicolas , Pierre Peterlongo , Sébastien Tempel

ABSTRACT

Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis. More... »

PAGES

293-337

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-1-4939-3167-5_17

DOI

http://dx.doi.org/10.1007/978-1-4939-3167-5_17

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1006392235

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/26519414


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/03", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Chemical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0399", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Other Chemical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0601", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biochemistry and Cell Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "DNA Transposable Elements", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome, Plant", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Plants", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Repetitive Sequences, Nucleic Acid", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Dyliss Team, Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France", 
          "id": "http://www.grid.ac/institutes/grid.410368.8", 
          "name": [
            "Dyliss Team, Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nicolas", 
        "givenName": "Jacques", 
        "id": "sg:person.01143715001.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01143715001.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France", 
          "id": "http://www.grid.ac/institutes/grid.410368.8", 
          "name": [
            "Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Peterlongo", 
        "givenName": "Pierre", 
        "id": "sg:person.01321640653.87", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01321640653.87"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "LCB, CNRS UMR 7283, 31 Chemin Joseph Aiguier, 13402, Marseille cedex 20, France", 
          "id": "http://www.grid.ac/institutes/grid.469471.9", 
          "name": [
            "LCB, CNRS UMR 7283, 31 Chemin Joseph Aiguier, 13402, Marseille cedex 20, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tempel", 
        "givenName": "S\u00e9bastien", 
        "id": "sg:person.01112725215.45", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01112725215.45"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2016", 
    "datePublishedReg": "2016-01-01", 
    "description": "Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.", 
    "editor": [
      {
        "familyName": "Edwards", 
        "givenName": "David", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-1-4939-3167-5_17", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-1-4939-3166-8", 
        "978-1-4939-3167-5"
      ], 
      "name": "Plant Bioinformatics", 
      "type": "Book"
    }, 
    "keywords": [
      "more abstract structures", 
      "biological sequence analysis", 
      "syntactic level", 
      "large-scale search", 
      "artificial texts", 
      "available software", 
      "range of tools", 
      "abstract structure", 
      "software", 
      "composition constraints", 
      "technical issues", 
      "distance constraints", 
      "elementary blocks", 
      "complex entity", 
      "current state", 
      "constraints", 
      "general approach", 
      "large class", 
      "key concepts", 
      "search", 
      "practical tool", 
      "linguistic analysis", 
      "text", 
      "tool", 
      "first stage", 
      "entities", 
      "language", 
      "concept", 
      "lexical level", 
      "terms of linguistics", 
      "biologists", 
      "words", 
      "proposal", 
      "dynamic field", 
      "art", 
      "class", 
      "strings", 
      "tour", 
      "last section", 
      "issues", 
      "topic", 
      "selection", 
      "complex pattern", 
      "block", 
      "sequence", 
      "foundation", 
      "order", 
      "linguistics", 
      "method", 
      "model", 
      "chapter", 
      "terms", 
      "structure", 
      "analysis", 
      "genomic sequences", 
      "profusion", 
      "field", 
      "elements", 
      "ordering", 
      "fact", 
      "second section", 
      "state", 
      "source", 
      "types", 
      "patterns", 
      "stage", 
      "levels", 
      "classes of repeats", 
      "range", 
      "whole section", 
      "sections", 
      "rest", 
      "major source", 
      "hypothetical model", 
      "plant genomes", 
      "plants", 
      "family", 
      "small repeats", 
      "genome", 
      "sequence analysis", 
      "motif", 
      "transposable elements", 
      "repeats", 
      "proportion", 
      "repeat family", 
      "higher proportion", 
      "approach", 
      "profusion of proposals", 
      "Characterizing Repeats"
    ], 
    "name": "Finding and Characterizing Repeats in Plant Genomes", 
    "pagination": "293-337", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1006392235"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-1-4939-3167-5_17"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "26519414"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-1-4939-3167-5_17", 
      "https://app.dimensions.ai/details/publication/pub.1006392235"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2021-11-01T18:56", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/chapter/chapter_335.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-1-4939-3167-5_17"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-1-4939-3167-5_17'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-1-4939-3167-5_17'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-1-4939-3167-5_17'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-1-4939-3167-5_17'


 

This table displays all metadata directly associated to this object as RDF triples.

208 TRIPLES      23 PREDICATES      124 URIs      115 LITERALS      15 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-1-4939-3167-5_17 schema:about N33cdd494234d41af912ff4b046f12095
2 N478cb8c1e38443a5b776db2ebb32b63a
3 N56e40d9d7a304657a655b4bfeb898548
4 N5861ba680a154150b518f07010c531a5
5 N670249d11bde444fbfb122a3b5f61ede
6 Nad6a8f3a3dfb44c583ba9263405cd21d
7 Ndd56b0383e7340faabfb6babe0d94c23
8 anzsrc-for:03
9 anzsrc-for:0399
10 anzsrc-for:06
11 anzsrc-for:0601
12 schema:author Ne35379f6c4024d2a9b0ad92bb67b6c7c
13 schema:datePublished 2016
14 schema:datePublishedReg 2016-01-01
15 schema:description Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.
16 schema:editor Ne0d0aab005c14fe68987c7a515aef59d
17 schema:genre chapter
18 schema:inLanguage en
19 schema:isAccessibleForFree true
20 schema:isPartOf Nfba63c0b5cc246568ba99bab841beab6
21 schema:keywords Characterizing Repeats
22 abstract structure
23 analysis
24 approach
25 art
26 artificial texts
27 available software
28 biological sequence analysis
29 biologists
30 block
31 chapter
32 class
33 classes of repeats
34 complex entity
35 complex pattern
36 composition constraints
37 concept
38 constraints
39 current state
40 distance constraints
41 dynamic field
42 elementary blocks
43 elements
44 entities
45 fact
46 family
47 field
48 first stage
49 foundation
50 general approach
51 genome
52 genomic sequences
53 higher proportion
54 hypothetical model
55 issues
56 key concepts
57 language
58 large class
59 large-scale search
60 last section
61 levels
62 lexical level
63 linguistic analysis
64 linguistics
65 major source
66 method
67 model
68 more abstract structures
69 motif
70 order
71 ordering
72 patterns
73 plant genomes
74 plants
75 practical tool
76 profusion
77 profusion of proposals
78 proportion
79 proposal
80 range
81 range of tools
82 repeat family
83 repeats
84 rest
85 search
86 second section
87 sections
88 selection
89 sequence
90 sequence analysis
91 small repeats
92 software
93 source
94 stage
95 state
96 strings
97 structure
98 syntactic level
99 technical issues
100 terms
101 terms of linguistics
102 text
103 tool
104 topic
105 tour
106 transposable elements
107 types
108 whole section
109 words
110 schema:name Finding and Characterizing Repeats in Plant Genomes
111 schema:pagination 293-337
112 schema:productId N65c62cb46525499cab566e473f5d59c0
113 Nd5a17349aaf34eeda8d1d78dca965277
114 Ne359b793d9c24805be30fc84e44a3c5b
115 schema:publisher N5fa42132cf7f47a4a87a6bc5b2898a7f
116 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006392235
117 https://doi.org/10.1007/978-1-4939-3167-5_17
118 schema:sdDatePublished 2021-11-01T18:56
119 schema:sdLicense https://scigraph.springernature.com/explorer/license/
120 schema:sdPublisher N59fa45447f9543c3bca20652d32abe50
121 schema:url https://doi.org/10.1007/978-1-4939-3167-5_17
122 sgo:license sg:explorer/license/
123 sgo:sdDataset chapters
124 rdf:type schema:Chapter
125 N012debc7584e48e389ab845d6e8fea07 rdf:first sg:person.01112725215.45
126 rdf:rest rdf:nil
127 N0199ba068d0c403f87ad829bf169258d schema:familyName Edwards
128 schema:givenName David
129 rdf:type schema:Person
130 N33cdd494234d41af912ff4b046f12095 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
131 schema:name Genomics
132 rdf:type schema:DefinedTerm
133 N478cb8c1e38443a5b776db2ebb32b63a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
134 schema:name DNA Transposable Elements
135 rdf:type schema:DefinedTerm
136 N56e40d9d7a304657a655b4bfeb898548 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
137 schema:name Genome, Plant
138 rdf:type schema:DefinedTerm
139 N5861ba680a154150b518f07010c531a5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
140 schema:name Software
141 rdf:type schema:DefinedTerm
142 N59fa45447f9543c3bca20652d32abe50 schema:name Springer Nature - SN SciGraph project
143 rdf:type schema:Organization
144 N5fa42132cf7f47a4a87a6bc5b2898a7f schema:name Springer Nature
145 rdf:type schema:Organisation
146 N65c62cb46525499cab566e473f5d59c0 schema:name dimensions_id
147 schema:value pub.1006392235
148 rdf:type schema:PropertyValue
149 N670249d11bde444fbfb122a3b5f61ede schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
150 schema:name Plants
151 rdf:type schema:DefinedTerm
152 Nad6a8f3a3dfb44c583ba9263405cd21d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
153 schema:name Computational Biology
154 rdf:type schema:DefinedTerm
155 Nb8ab9ff7063e4f07b26d8b3e4736381a rdf:first sg:person.01321640653.87
156 rdf:rest N012debc7584e48e389ab845d6e8fea07
157 Nd5a17349aaf34eeda8d1d78dca965277 schema:name pubmed_id
158 schema:value 26519414
159 rdf:type schema:PropertyValue
160 Ndd56b0383e7340faabfb6babe0d94c23 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
161 schema:name Repetitive Sequences, Nucleic Acid
162 rdf:type schema:DefinedTerm
163 Ne0d0aab005c14fe68987c7a515aef59d rdf:first N0199ba068d0c403f87ad829bf169258d
164 rdf:rest rdf:nil
165 Ne35379f6c4024d2a9b0ad92bb67b6c7c rdf:first sg:person.01143715001.20
166 rdf:rest Nb8ab9ff7063e4f07b26d8b3e4736381a
167 Ne359b793d9c24805be30fc84e44a3c5b schema:name doi
168 schema:value 10.1007/978-1-4939-3167-5_17
169 rdf:type schema:PropertyValue
170 Nfba63c0b5cc246568ba99bab841beab6 schema:isbn 978-1-4939-3166-8
171 978-1-4939-3167-5
172 schema:name Plant Bioinformatics
173 rdf:type schema:Book
174 anzsrc-for:03 schema:inDefinedTermSet anzsrc-for:
175 schema:name Chemical Sciences
176 rdf:type schema:DefinedTerm
177 anzsrc-for:0399 schema:inDefinedTermSet anzsrc-for:
178 schema:name Other Chemical Sciences
179 rdf:type schema:DefinedTerm
180 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
181 schema:name Biological Sciences
182 rdf:type schema:DefinedTerm
183 anzsrc-for:0601 schema:inDefinedTermSet anzsrc-for:
184 schema:name Biochemistry and Cell Biology
185 rdf:type schema:DefinedTerm
186 sg:person.01112725215.45 schema:affiliation grid-institutes:grid.469471.9
187 schema:familyName Tempel
188 schema:givenName Sébastien
189 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01112725215.45
190 rdf:type schema:Person
191 sg:person.01143715001.20 schema:affiliation grid-institutes:grid.410368.8
192 schema:familyName Nicolas
193 schema:givenName Jacques
194 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01143715001.20
195 rdf:type schema:Person
196 sg:person.01321640653.87 schema:affiliation grid-institutes:grid.410368.8
197 schema:familyName Peterlongo
198 schema:givenName Pierre
199 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01321640653.87
200 rdf:type schema:Person
201 grid-institutes:grid.410368.8 schema:alternateName Dyliss Team, Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France
202 Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France
203 schema:name Dyliss Team, Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France
204 Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France
205 rdf:type schema:Organization
206 grid-institutes:grid.469471.9 schema:alternateName LCB, CNRS UMR 7283, 31 Chemin Joseph Aiguier, 13402, Marseille cedex 20, France
207 schema:name LCB, CNRS UMR 7283, 31 Chemin Joseph Aiguier, 13402, Marseille cedex 20, France
208 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...