Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2014

AUTHORS

Laxmi Parida , Cinzia Pizzi , Simona E. Rombo

ABSTRACT

The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing techniques. More... »

PAGES

148-160

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12

DOI

http://dx.doi.org/10.1007/978-3-662-44753-6_12

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1016352059


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "IBM T. J. Watson Research Center, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM T. J. Watson Research Center, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Parida", 
        "givenName": "Laxmi", 
        "id": "sg:person.01336557015.68", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Information Engineering, University of Padova, Italy", 
          "id": "http://www.grid.ac/institutes/grid.5608.b", 
          "name": [
            "Department of Information Engineering, University of Padova, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pizzi", 
        "givenName": "Cinzia", 
        "id": "sg:person.010544745135.35", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010544745135.35"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Mathematics and Computer Science, University of Palermo, Italy", 
          "id": "http://www.grid.ac/institutes/grid.10776.37", 
          "name": [
            "Department of Mathematics and Computer Science, University of Palermo, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rombo", 
        "givenName": "Simona E.", 
        "id": "sg:person.013316136215.88", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013316136215.88"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2014", 
    "datePublishedReg": "2014-01-01", 
    "description": "The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing techniques.", 
    "editor": [
      {
        "familyName": "Brown", 
        "givenName": "Dan", 
        "type": "Person"
      }, 
      {
        "familyName": "Morgenstern", 
        "givenName": "Burkhard", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-662-44753-6_12", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-662-44752-9", 
        "978-3-662-44753-6"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "keywords": [
      "entropic profiles", 
      "linear space algorithm", 
      "significant repetition", 
      "space algorithm", 
      "real data", 
      "degree of predictability", 
      "compact class", 
      "maximal motifs", 
      "biological sequences", 
      "representation", 
      "subsequences", 
      "class", 
      "further contribution", 
      "algorithm", 
      "sequence", 
      "compressibility", 
      "profile", 
      "useful tool", 
      "Vinga", 
      "preliminary results", 
      "speed", 
      "approach", 
      "respect", 
      "technique", 
      "predictability", 
      "tool", 
      "under", 
      "scale", 
      "subset", 
      "contribution", 
      "Almeida", 
      "results", 
      "degree", 
      "repetitiveness", 
      "information", 
      "region", 
      "hand", 
      "identification", 
      "compression", 
      "data", 
      "DNA regions", 
      "repetitive motifs", 
      "motif", 
      "repetition", 
      "relationship", 
      "discovery", 
      "genomic sequences", 
      "paper"
    ], 
    "name": "Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences", 
    "pagination": "148-160", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1016352059"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-662-44753-6_12"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-662-44753-6_12", 
      "https://app.dimensions.ai/details/publication/pub.1016352059"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-10-01T06:57", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/chapter/chapter_349.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-662-44753-6_12"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-44753-6_12'


 

This table displays all metadata directly associated to this object as RDF triples.

132 TRIPLES      22 PREDICATES      73 URIs      66 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-662-44753-6_12 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author N70b2e695c8474a2b9a98bbf22ad1035e
4 schema:datePublished 2014
5 schema:datePublishedReg 2014-01-01
6 schema:description The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing techniques.
7 schema:editor Nc4948468b95c4e97a6144b78175694f8
8 schema:genre chapter
9 schema:isAccessibleForFree false
10 schema:isPartOf Nf681b1ceae4941ac8a576fdcc9c60fe3
11 schema:keywords Almeida
12 DNA regions
13 Vinga
14 algorithm
15 approach
16 biological sequences
17 class
18 compact class
19 compressibility
20 compression
21 contribution
22 data
23 degree
24 degree of predictability
25 discovery
26 entropic profiles
27 further contribution
28 genomic sequences
29 hand
30 identification
31 information
32 linear space algorithm
33 maximal motifs
34 motif
35 paper
36 predictability
37 preliminary results
38 profile
39 real data
40 region
41 relationship
42 repetition
43 repetitive motifs
44 repetitiveness
45 representation
46 respect
47 results
48 scale
49 sequence
50 significant repetition
51 space algorithm
52 speed
53 subsequences
54 subset
55 technique
56 tool
57 under
58 useful tool
59 schema:name Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences
60 schema:pagination 148-160
61 schema:productId N03bd026b99e9431c9f1d07b9830f9d0b
62 N9677ac161cbb4d109135fff193a4fe49
63 schema:publisher N53678c1c0eb6443f9ddb45510bdb62a0
64 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016352059
65 https://doi.org/10.1007/978-3-662-44753-6_12
66 schema:sdDatePublished 2022-10-01T06:57
67 schema:sdLicense https://scigraph.springernature.com/explorer/license/
68 schema:sdPublisher N8d151a002cf64bc1914466fad638b291
69 schema:url https://doi.org/10.1007/978-3-662-44753-6_12
70 sgo:license sg:explorer/license/
71 sgo:sdDataset chapters
72 rdf:type schema:Chapter
73 N03bd026b99e9431c9f1d07b9830f9d0b schema:name doi
74 schema:value 10.1007/978-3-662-44753-6_12
75 rdf:type schema:PropertyValue
76 N43719a58f74b4fb88089fa862009953b rdf:first sg:person.013316136215.88
77 rdf:rest rdf:nil
78 N47fa0817d9b34c7794041b0c3d50900f rdf:first sg:person.010544745135.35
79 rdf:rest N43719a58f74b4fb88089fa862009953b
80 N53678c1c0eb6443f9ddb45510bdb62a0 schema:name Springer Nature
81 rdf:type schema:Organisation
82 N5be70804264446d19d872fc35e134a6f rdf:first N8dd4871a05484598ba384d56fc6a9455
83 rdf:rest rdf:nil
84 N70b2e695c8474a2b9a98bbf22ad1035e rdf:first sg:person.01336557015.68
85 rdf:rest N47fa0817d9b34c7794041b0c3d50900f
86 N8c74b9aacf5b432fb3b19c9322a72155 schema:familyName Brown
87 schema:givenName Dan
88 rdf:type schema:Person
89 N8d151a002cf64bc1914466fad638b291 schema:name Springer Nature - SN SciGraph project
90 rdf:type schema:Organization
91 N8dd4871a05484598ba384d56fc6a9455 schema:familyName Morgenstern
92 schema:givenName Burkhard
93 rdf:type schema:Person
94 N9677ac161cbb4d109135fff193a4fe49 schema:name dimensions_id
95 schema:value pub.1016352059
96 rdf:type schema:PropertyValue
97 Nc4948468b95c4e97a6144b78175694f8 rdf:first N8c74b9aacf5b432fb3b19c9322a72155
98 rdf:rest N5be70804264446d19d872fc35e134a6f
99 Nf681b1ceae4941ac8a576fdcc9c60fe3 schema:isbn 978-3-662-44752-9
100 978-3-662-44753-6
101 schema:name Algorithms in Bioinformatics
102 rdf:type schema:Book
103 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
104 schema:name Biological Sciences
105 rdf:type schema:DefinedTerm
106 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
107 schema:name Genetics
108 rdf:type schema:DefinedTerm
109 sg:person.010544745135.35 schema:affiliation grid-institutes:grid.5608.b
110 schema:familyName Pizzi
111 schema:givenName Cinzia
112 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010544745135.35
113 rdf:type schema:Person
114 sg:person.013316136215.88 schema:affiliation grid-institutes:grid.10776.37
115 schema:familyName Rombo
116 schema:givenName Simona E.
117 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013316136215.88
118 rdf:type schema:Person
119 sg:person.01336557015.68 schema:affiliation grid-institutes:grid.481554.9
120 schema:familyName Parida
121 schema:givenName Laxmi
122 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68
123 rdf:type schema:Person
124 grid-institutes:grid.10776.37 schema:alternateName Department of Mathematics and Computer Science, University of Palermo, Italy
125 schema:name Department of Mathematics and Computer Science, University of Palermo, Italy
126 rdf:type schema:Organization
127 grid-institutes:grid.481554.9 schema:alternateName IBM T. J. Watson Research Center, USA
128 schema:name IBM T. J. Watson Research Center, USA
129 rdf:type schema:Organization
130 grid-institutes:grid.5608.b schema:alternateName Department of Information Engineering, University of Padova, Italy
131 schema:name Department of Information Engineering, University of Padova, Italy
132 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...