Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2014

AUTHORS

Laxmi Parida , Cinzia Pizzi , Simona E. Rombo

ABSTRACT

The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing techniques. More... »

PAGES

148-160

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12

DOI

http://dx.doi.org/10.1007/978-3-662-44753-6_12

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1016352059


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "IBM T. J. Watson Research Center, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM T. J. Watson Research Center, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Parida", 
        "givenName": "Laxmi", 
        "id": "sg:person.01336557015.68", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Information Engineering, University of Padova, Italy", 
          "id": "http://www.grid.ac/institutes/grid.5608.b", 
          "name": [
            "Department of Information Engineering, University of Padova, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pizzi", 
        "givenName": "Cinzia", 
        "id": "sg:person.010544745135.35", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010544745135.35"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Mathematics and Computer Science, University of Palermo, Italy", 
          "id": "http://www.grid.ac/institutes/grid.10776.37", 
          "name": [
            "Department of Mathematics and Computer Science, University of Palermo, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rombo", 
        "givenName": "Simona E.", 
        "id": "sg:person.013316136215.88", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013316136215.88"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2014", 
    "datePublishedReg": "2014-01-01", 
    "description": "The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing techniques.", 
    "editor": [
      {
        "familyName": "Brown", 
        "givenName": "Dan", 
        "type": "Person"
      }, 
      {
        "familyName": "Morgenstern", 
        "givenName": "Burkhard", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-662-44753-6_12", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-662-44752-9", 
        "978-3-662-44753-6"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "keywords": [
      "entropic profiles", 
      "linear space algorithm", 
      "significant repetition", 
      "space algorithm", 
      "real data", 
      "degree of predictability", 
      "compact class", 
      "maximal motifs", 
      "biological sequences", 
      "representation", 
      "subsequences", 
      "class", 
      "further contribution", 
      "algorithm", 
      "sequence", 
      "compressibility", 
      "profile", 
      "useful tool", 
      "Vinga", 
      "preliminary results", 
      "speed", 
      "approach", 
      "respect", 
      "technique", 
      "predictability", 
      "tool", 
      "under", 
      "scale", 
      "subset", 
      "contribution", 
      "Almeida", 
      "results", 
      "degree", 
      "repetitiveness", 
      "information", 
      "region", 
      "hand", 
      "identification", 
      "compression", 
      "data", 
      "DNA regions", 
      "repetitive motifs", 
      "motif", 
      "repetition", 
      "relationship", 
      "discovery", 
      "genomic sequences", 
      "paper"
    ], 
    "name": "Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences", 
    "pagination": "148-160", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1016352059"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-662-44753-6_12"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-662-44753-6_12", 
      "https://app.dimensions.ai/details/publication/pub.1016352059"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-12-01T06:52", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/chapter/chapter_390.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-662-44753-6_12"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12'


 

This table displays all metadata directly associated to this object as RDF triples.

132 TRIPLES      22 PREDICATES      73 URIs      66 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-662-44753-6_12 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author N96c2a7c601954fdb85ab703b1edc3364
4 schema:datePublished 2014
5 schema:datePublishedReg 2014-01-01
6 schema:description The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing techniques.
7 schema:editor N502f33371d8a4fda8787fba57cfcee96
8 schema:genre chapter
9 schema:isAccessibleForFree false
10 schema:isPartOf Naf48d93d955d47ce81ae58e3ac655522
11 schema:keywords Almeida
12 DNA regions
13 Vinga
14 algorithm
15 approach
16 biological sequences
17 class
18 compact class
19 compressibility
20 compression
21 contribution
22 data
23 degree
24 degree of predictability
25 discovery
26 entropic profiles
27 further contribution
28 genomic sequences
29 hand
30 identification
31 information
32 linear space algorithm
33 maximal motifs
34 motif
35 paper
36 predictability
37 preliminary results
38 profile
39 real data
40 region
41 relationship
42 repetition
43 repetitive motifs
44 repetitiveness
45 representation
46 respect
47 results
48 scale
49 sequence
50 significant repetition
51 space algorithm
52 speed
53 subsequences
54 subset
55 technique
56 tool
57 under
58 useful tool
59 schema:name Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences
60 schema:pagination 148-160
61 schema:productId Nb563e12275654f8ea01f031a51426e5a
62 Nc56cb8dfd6584461bece045f216df324
63 schema:publisher Na69a885befcd4218970d83babd290420
64 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016352059
65 https://doi.org/10.1007/978-3-662-44753-6_12
66 schema:sdDatePublished 2022-12-01T06:52
67 schema:sdLicense https://scigraph.springernature.com/explorer/license/
68 schema:sdPublisher N19a3ea373747421fadcbdf0698c047d1
69 schema:url https://doi.org/10.1007/978-3-662-44753-6_12
70 sgo:license sg:explorer/license/
71 sgo:sdDataset chapters
72 rdf:type schema:Chapter
73 N19a3ea373747421fadcbdf0698c047d1 schema:name Springer Nature - SN SciGraph project
74 rdf:type schema:Organization
75 N447ae88281064742a9c2be6adc7af886 schema:familyName Brown
76 schema:givenName Dan
77 rdf:type schema:Person
78 N502f33371d8a4fda8787fba57cfcee96 rdf:first N447ae88281064742a9c2be6adc7af886
79 rdf:rest N6ca3fe51c6fd4c2db538301fc75ea7a5
80 N6ca3fe51c6fd4c2db538301fc75ea7a5 rdf:first Ne264cbc85f2a4efbbaa804ec6a0f96c6
81 rdf:rest rdf:nil
82 N94590fb7ecfd417d99346e2f0156e682 rdf:first sg:person.013316136215.88
83 rdf:rest rdf:nil
84 N96c2a7c601954fdb85ab703b1edc3364 rdf:first sg:person.01336557015.68
85 rdf:rest Nd1dd12f4c8f64fe69b6ae94fdd1dddf6
86 Na69a885befcd4218970d83babd290420 schema:name Springer Nature
87 rdf:type schema:Organisation
88 Naf48d93d955d47ce81ae58e3ac655522 schema:isbn 978-3-662-44752-9
89 978-3-662-44753-6
90 schema:name Algorithms in Bioinformatics
91 rdf:type schema:Book
92 Nb563e12275654f8ea01f031a51426e5a schema:name doi
93 schema:value 10.1007/978-3-662-44753-6_12
94 rdf:type schema:PropertyValue
95 Nc56cb8dfd6584461bece045f216df324 schema:name dimensions_id
96 schema:value pub.1016352059
97 rdf:type schema:PropertyValue
98 Nd1dd12f4c8f64fe69b6ae94fdd1dddf6 rdf:first sg:person.010544745135.35
99 rdf:rest N94590fb7ecfd417d99346e2f0156e682
100 Ne264cbc85f2a4efbbaa804ec6a0f96c6 schema:familyName Morgenstern
101 schema:givenName Burkhard
102 rdf:type schema:Person
103 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
104 schema:name Biological Sciences
105 rdf:type schema:DefinedTerm
106 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
107 schema:name Genetics
108 rdf:type schema:DefinedTerm
109 sg:person.010544745135.35 schema:affiliation grid-institutes:grid.5608.b
110 schema:familyName Pizzi
111 schema:givenName Cinzia
112 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010544745135.35
113 rdf:type schema:Person
114 sg:person.013316136215.88 schema:affiliation grid-institutes:grid.10776.37
115 schema:familyName Rombo
116 schema:givenName Simona E.
117 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013316136215.88
118 rdf:type schema:Person
119 sg:person.01336557015.68 schema:affiliation grid-institutes:grid.481554.9
120 schema:familyName Parida
121 schema:givenName Laxmi
122 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68
123 rdf:type schema:Person
124 grid-institutes:grid.10776.37 schema:alternateName Department of Mathematics and Computer Science, University of Palermo, Italy
125 schema:name Department of Mathematics and Computer Science, University of Palermo, Italy
126 rdf:type schema:Organization
127 grid-institutes:grid.481554.9 schema:alternateName IBM T. J. Watson Research Center, USA
128 schema:name IBM T. J. Watson Research Center, USA
129 rdf:type schema:Organization
130 grid-institutes:grid.5608.b schema:alternateName Department of Information Engineering, University of Padova, Italy
131 schema:name Department of Information Engineering, University of Padova, Italy
132 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...