Characterization and Extraction of Irredundant Tandem Motifs View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2012

AUTHORS

Laxmi Parida , Cinzia Pizzi , Simona E. Rombo

ABSTRACT

We address the problem of extracting pairs of subwords (m1,m2) from a text string s of length n, such that, given also an integer constant d in input, m1 and m2 occur in tandem within a maximum distance of d symbols in s.The main effort of this work is to eliminate the possible redundancy from the candidate set of the so found tandem motifs. To this aim, we first introduce the concept of maximality, characterized by four specific conditions, that we show to be not deducible by the corresponding notion of maximality already defined for “simple” (i.e., non tandem) motifs. Then, we further eliminate the remaining redundancy by defining the concept of irredundancy for tandem motifs.We prove that the number of non-overlapping irredundant tandems is O(d2n) which, considering d as a constant, leads to a linear number of tandems in the length of the input string. This is an order of magnitude less than previously developed compact indexes for tandem extraction. As a further contribution we show an algorithm to extract this compact irredundant index. More... »

PAGES

385-397

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-34109-0_41

DOI

http://dx.doi.org/10.1007/978-3-642-34109-0_41

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1040355876


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0101", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Pure Mathematics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "IBM T.J. Watson Research Center, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM T.J. Watson Research Center, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Parida", 
        "givenName": "Laxmi", 
        "id": "sg:person.01336557015.68", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Information Engineering, University of Padova, Italy", 
          "id": "http://www.grid.ac/institutes/grid.5608.b", 
          "name": [
            "Department of Information Engineering, University of Padova, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pizzi", 
        "givenName": "Cinzia", 
        "id": "sg:person.010544745135.35", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010544745135.35"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "ICAR-CNR of Cosenza & DEIS, Universit\u00e0 della Calabria, Italy", 
          "id": "http://www.grid.ac/institutes/grid.7778.f", 
          "name": [
            "ICAR-CNR of Cosenza & DEIS, Universit\u00e0 della Calabria, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rombo", 
        "givenName": "Simona E.", 
        "id": "sg:person.013316136215.88", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013316136215.88"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2012", 
    "datePublishedReg": "2012-01-01", 
    "description": "We address the problem of extracting pairs of subwords (m1,m2) from a text string s of length n, such that, given also an integer constant d in input, m1 and m2 occur in tandem within a maximum distance of d symbols in s.The main effort of this work is to eliminate the possible redundancy from the candidate set of the so found tandem motifs. To this aim, we first introduce the concept of maximality, characterized by four specific conditions, that we show to be not deducible by the corresponding notion of maximality already defined for \u201csimple\u201d (i.e., non tandem) motifs. Then, we further eliminate the remaining redundancy by defining the concept of irredundancy for tandem motifs.We prove that the number of non-overlapping irredundant tandems is O(d2n) which, considering d as a constant, leads to a linear number of tandems in the length of the input string. This is an order of magnitude less than previously developed compact indexes for tandem extraction. As a further contribution we show an algorithm to extract this compact irredundant index.", 
    "editor": [
      {
        "familyName": "Calder\u00f3n-Benavides", 
        "givenName": "Liliana", 
        "type": "Person"
      }, 
      {
        "familyName": "Gonz\u00e1lez-Caro", 
        "givenName": "Cristina", 
        "type": "Person"
      }, 
      {
        "familyName": "Ch\u00e1vez", 
        "givenName": "Edgar", 
        "type": "Person"
      }, 
      {
        "familyName": "Ziviani", 
        "givenName": "Nivio", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-34109-0_41", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-642-34108-3", 
        "978-3-642-34109-0"
      ], 
      "name": "String Processing and Information Retrieval", 
      "type": "Book"
    }, 
    "keywords": [
      "compact index", 
      "input string", 
      "concept of maximality", 
      "linear number", 
      "string S", 
      "redundancy", 
      "possible redundancy", 
      "main effort", 
      "corresponding notion", 
      "algorithm", 
      "subwords", 
      "length n", 
      "extraction", 
      "further contribution", 
      "maximum distance", 
      "concept", 
      "irredundancy", 
      "input", 
      "maximality", 
      "string", 
      "symbols", 
      "work", 
      "number", 
      "orders of magnitude", 
      "efforts", 
      "notion", 
      "order", 
      "distance", 
      "specific conditions", 
      "pairs", 
      "tandem", 
      "tandem motifs", 
      "contribution", 
      "candidates", 
      "index", 
      "aim", 
      "conditions", 
      "length", 
      "magnitude", 
      "motif", 
      "d symbols", 
      "characterization", 
      "problem", 
      "constants", 
      "M1", 
      "m2", 
      "tandem extraction"
    ], 
    "name": "Characterization and Extraction of Irredundant Tandem Motifs", 
    "pagination": "385-397", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1040355876"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-34109-0_41"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-34109-0_41", 
      "https://app.dimensions.ai/details/publication/pub.1040355876"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-09-02T16:15", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220902/entities/gbq_results/chapter/chapter_324.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-34109-0_41"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-34109-0_41'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-34109-0_41'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-34109-0_41'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-34109-0_41'


 

This table displays all metadata directly associated to this object as RDF triples.

141 TRIPLES      22 PREDICATES      72 URIs      65 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-34109-0_41 schema:about anzsrc-for:01
2 anzsrc-for:0101
3 schema:author Ndeed55226ec440a78a31b59958a6f6b7
4 schema:datePublished 2012
5 schema:datePublishedReg 2012-01-01
6 schema:description We address the problem of extracting pairs of subwords (m1,m2) from a text string s of length n, such that, given also an integer constant d in input, m1 and m2 occur in tandem within a maximum distance of d symbols in s.The main effort of this work is to eliminate the possible redundancy from the candidate set of the so found tandem motifs. To this aim, we first introduce the concept of maximality, characterized by four specific conditions, that we show to be not deducible by the corresponding notion of maximality already defined for “simple” (i.e., non tandem) motifs. Then, we further eliminate the remaining redundancy by defining the concept of irredundancy for tandem motifs.We prove that the number of non-overlapping irredundant tandems is O(d2n) which, considering d as a constant, leads to a linear number of tandems in the length of the input string. This is an order of magnitude less than previously developed compact indexes for tandem extraction. As a further contribution we show an algorithm to extract this compact irredundant index.
7 schema:editor N35daf6284a494c819056efa004f764a3
8 schema:genre chapter
9 schema:isAccessibleForFree false
10 schema:isPartOf N7305be0f652641dd8b216a48bfa3a637
11 schema:keywords M1
12 aim
13 algorithm
14 candidates
15 characterization
16 compact index
17 concept
18 concept of maximality
19 conditions
20 constants
21 contribution
22 corresponding notion
23 d symbols
24 distance
25 efforts
26 extraction
27 further contribution
28 index
29 input
30 input string
31 irredundancy
32 length
33 length n
34 linear number
35 m2
36 magnitude
37 main effort
38 maximality
39 maximum distance
40 motif
41 notion
42 number
43 order
44 orders of magnitude
45 pairs
46 possible redundancy
47 problem
48 redundancy
49 specific conditions
50 string
51 string S
52 subwords
53 symbols
54 tandem
55 tandem extraction
56 tandem motifs
57 work
58 schema:name Characterization and Extraction of Irredundant Tandem Motifs
59 schema:pagination 385-397
60 schema:productId N709af3cbb0904ddcbaa05ab207cd5185
61 Nb7cf90e6ecfb499ea146a3b192d1072f
62 schema:publisher Ned1b7d4096f84c3da6ab2d791e7d1121
63 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040355876
64 https://doi.org/10.1007/978-3-642-34109-0_41
65 schema:sdDatePublished 2022-09-02T16:15
66 schema:sdLicense https://scigraph.springernature.com/explorer/license/
67 schema:sdPublisher Nd0147db2a3b54c8ca2eb2e1489188196
68 schema:url https://doi.org/10.1007/978-3-642-34109-0_41
69 sgo:license sg:explorer/license/
70 sgo:sdDataset chapters
71 rdf:type schema:Chapter
72 N03c291b67e4044868d609805bbbd9e61 schema:familyName González-Caro
73 schema:givenName Cristina
74 rdf:type schema:Person
75 N04286a23caeb4716933f2c56b8a4ec25 schema:familyName Chávez
76 schema:givenName Edgar
77 rdf:type schema:Person
78 N35daf6284a494c819056efa004f764a3 rdf:first N885861171a2f434eb94794bd7d1dd354
79 rdf:rest N6440ad4ba0a44bbd9dfdba41ee8bed89
80 N6440ad4ba0a44bbd9dfdba41ee8bed89 rdf:first N03c291b67e4044868d609805bbbd9e61
81 rdf:rest N6b0ce048f4624f088ef90b4ccde233e4
82 N6b0ce048f4624f088ef90b4ccde233e4 rdf:first N04286a23caeb4716933f2c56b8a4ec25
83 rdf:rest Nd263c5c06d7b41e1b76b9ac2c8b14113
84 N709af3cbb0904ddcbaa05ab207cd5185 schema:name dimensions_id
85 schema:value pub.1040355876
86 rdf:type schema:PropertyValue
87 N7305be0f652641dd8b216a48bfa3a637 schema:isbn 978-3-642-34108-3
88 978-3-642-34109-0
89 schema:name String Processing and Information Retrieval
90 rdf:type schema:Book
91 N885861171a2f434eb94794bd7d1dd354 schema:familyName Calderón-Benavides
92 schema:givenName Liliana
93 rdf:type schema:Person
94 Nb7cf90e6ecfb499ea146a3b192d1072f schema:name doi
95 schema:value 10.1007/978-3-642-34109-0_41
96 rdf:type schema:PropertyValue
97 Nb8657bbc81bb44ef8f46cdba3ffd7959 rdf:first sg:person.010544745135.35
98 rdf:rest Nbc4a200bb6104baa999d729c851d7c3b
99 Nbc4a200bb6104baa999d729c851d7c3b rdf:first sg:person.013316136215.88
100 rdf:rest rdf:nil
101 Nd0147db2a3b54c8ca2eb2e1489188196 schema:name Springer Nature - SN SciGraph project
102 rdf:type schema:Organization
103 Nd263c5c06d7b41e1b76b9ac2c8b14113 rdf:first Nefecee6c42d44b0aaa969b1d042801a9
104 rdf:rest rdf:nil
105 Ndeed55226ec440a78a31b59958a6f6b7 rdf:first sg:person.01336557015.68
106 rdf:rest Nb8657bbc81bb44ef8f46cdba3ffd7959
107 Ned1b7d4096f84c3da6ab2d791e7d1121 schema:name Springer Nature
108 rdf:type schema:Organisation
109 Nefecee6c42d44b0aaa969b1d042801a9 schema:familyName Ziviani
110 schema:givenName Nivio
111 rdf:type schema:Person
112 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
113 schema:name Mathematical Sciences
114 rdf:type schema:DefinedTerm
115 anzsrc-for:0101 schema:inDefinedTermSet anzsrc-for:
116 schema:name Pure Mathematics
117 rdf:type schema:DefinedTerm
118 sg:person.010544745135.35 schema:affiliation grid-institutes:grid.5608.b
119 schema:familyName Pizzi
120 schema:givenName Cinzia
121 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010544745135.35
122 rdf:type schema:Person
123 sg:person.013316136215.88 schema:affiliation grid-institutes:grid.7778.f
124 schema:familyName Rombo
125 schema:givenName Simona E.
126 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013316136215.88
127 rdf:type schema:Person
128 sg:person.01336557015.68 schema:affiliation grid-institutes:grid.481554.9
129 schema:familyName Parida
130 schema:givenName Laxmi
131 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68
132 rdf:type schema:Person
133 grid-institutes:grid.481554.9 schema:alternateName IBM T.J. Watson Research Center, USA
134 schema:name IBM T.J. Watson Research Center, USA
135 rdf:type schema:Organization
136 grid-institutes:grid.5608.b schema:alternateName Department of Information Engineering, University of Padova, Italy
137 schema:name Department of Information Engineering, University of Padova, Italy
138 rdf:type schema:Organization
139 grid-institutes:grid.7778.f schema:alternateName ICAR-CNR of Cosenza & DEIS, Università della Calabria, Italy
140 schema:name ICAR-CNR of Cosenza & DEIS, Università della Calabria, Italy
141 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...