A really Simple Approximation of Smallest Grammar View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2014

AUTHORS

Artur Jeż

ABSTRACT

We present a really simple linear-time algorithm constructing a context-free grammar of size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(g log (N/g))$\end{document} for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear when the alphabet Σ of the input string can be identified with numbers from {1,…, N}. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses involved. The here presented algorithm computes the LZ77 factorisation (of size l) and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(l)$\end{document} letters. In one phase in a greedy way (by a left-to-right sweep) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(l)$\end{document} many different pairs among them. Hence there are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(log N)$\end{document} phases, each introduces \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(l)$\end{document} nonterminals. A more precise analysis yields a bound \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(l log(N/l))$\end{document}. As l ≤ g, this yields \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(g log(N/g))$\end{document}. More... »

PAGES

182-191

Book

TITLE

Combinatorial Pattern Matching

ISBN

978-3-319-07565-5
978-3-319-07566-2

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-07566-2_19

DOI

http://dx.doi.org/10.1007/978-3-319-07566-2_19

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1035201050


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0101", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Pure Mathematics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Institute of Computer Science, University of Wroc\u0142aw, Poland", 
          "id": "http://www.grid.ac/institutes/grid.8505.8", 
          "name": [
            "Max Planck Institute f\u00fcr Informatik, Saarbr\u00fccken, Germany", 
            "Institute of Computer Science, University of Wroc\u0142aw, Poland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Je\u017c", 
        "givenName": "Artur", 
        "id": "sg:person.015252371751.76", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015252371751.76"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2014", 
    "datePublishedReg": "2014-01-01", 
    "description": "We present a really simple linear-time algorithm constructing a context-free grammar of size \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$\\mathcal{O}(g log (N/g))$\\end{document} for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear when the alphabet \u03a3 of the input string can be identified with numbers from {1,\u2026, N}. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses involved. The here presented algorithm computes the LZ77 factorisation (of size l) and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$\\mathcal{O}(l)$\\end{document} letters. In one phase in a greedy way (by a left-to-right sweep) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$\\mathcal{O}(l)$\\end{document} many different pairs among them. Hence there are \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$\\mathcal{O}(log N)$\\end{document} phases, each introduces \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$\\mathcal{O}(l)$\\end{document} nonterminals. A more precise analysis yields a bound \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$\\mathcal{O}(l log(N/l))$\\end{document}. As l\u2009\u2264\u2009g, this yields \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$\\mathcal{O}(g log(N/g))$\\end{document}.", 
    "editor": [
      {
        "familyName": "Kulikov", 
        "givenName": "Alexander S.", 
        "type": "Person"
      }, 
      {
        "familyName": "Kuznetsov", 
        "givenName": "Sergei O.", 
        "type": "Person"
      }, 
      {
        "familyName": "Pevzner", 
        "givenName": "Pavel", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-07566-2_19", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-319-07565-5", 
        "978-3-319-07566-2"
      ], 
      "name": "Combinatorial Pattern Matching", 
      "type": "Book"
    }, 
    "keywords": [
      "simple linear time algorithm", 
      "linear time algorithm", 
      "running time", 
      "simple approximation", 
      "algorithm", 
      "input string", 
      "strings", 
      "arbitrary size alphabets", 
      "size alphabet", 
      "approximation guarantee", 
      "algorithm computes", 
      "factorisation", 
      "greedy way", 
      "set of pairs", 
      "context-free grammars", 
      "optimal grammar", 
      "alphabet \u03a3", 
      "guarantees", 
      "compute", 
      "letter", 
      "set", 
      "different pairs", 
      "precise analysis", 
      "approximation", 
      "size", 
      "alphabet", 
      "time", 
      "number", 
      "analysis", 
      "phase", 
      "way", 
      "pairs", 
      "consecutive letters", 
      "new symbols", 
      "symbols", 
      "introduces", 
      "grammar", 
      "words", 
      "factors", 
      "nonterminals", 
      "yield"
    ], 
    "name": "A really Simple Approximation of Smallest Grammar", 
    "pagination": "182-191", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1035201050"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-07566-2_19"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-07566-2_19", 
      "https://app.dimensions.ai/details/publication/pub.1035201050"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-05-20T07:47", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/chapter/chapter_398.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-319-07566-2_19"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-07566-2_19'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-07566-2_19'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-07566-2_19'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-07566-2_19'


 

This table displays all metadata directly associated to this object as RDF triples.

112 TRIPLES      23 PREDICATES      67 URIs      60 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-07566-2_19 schema:about anzsrc-for:01
2 anzsrc-for:0101
3 schema:author Nd70b6d5465a34a4a90cdf00b9897df6d
4 schema:datePublished 2014
5 schema:datePublishedReg 2014-01-01
6 schema:description We present a really simple linear-time algorithm constructing a context-free grammar of size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(g log (N/g))$\end{document} for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear when the alphabet Σ of the input string can be identified with numbers from {1,…, N}. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses involved. The here presented algorithm computes the LZ77 factorisation (of size l) and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(l)$\end{document} letters. In one phase in a greedy way (by a left-to-right sweep) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(l)$\end{document} many different pairs among them. Hence there are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(log N)$\end{document} phases, each introduces \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(l)$\end{document} nonterminals. A more precise analysis yields a bound \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(l log(N/l))$\end{document}. As l ≤ g, this yields \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(g log(N/g))$\end{document}.
7 schema:editor Nfe886c773dc84dcaa15ba62dff17c704
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf Nc3468335339340d8964f457bee3c1bae
12 schema:keywords algorithm
13 algorithm computes
14 alphabet
15 alphabet Σ
16 analysis
17 approximation
18 approximation guarantee
19 arbitrary size alphabets
20 compute
21 consecutive letters
22 context-free grammars
23 different pairs
24 factorisation
25 factors
26 grammar
27 greedy way
28 guarantees
29 input string
30 introduces
31 letter
32 linear time algorithm
33 new symbols
34 nonterminals
35 number
36 optimal grammar
37 pairs
38 phase
39 precise analysis
40 running time
41 set
42 set of pairs
43 simple approximation
44 simple linear time algorithm
45 size
46 size alphabet
47 strings
48 symbols
49 time
50 way
51 words
52 yield
53 schema:name A really Simple Approximation of Smallest Grammar
54 schema:pagination 182-191
55 schema:productId Nc8892c7d1e13462c963f60e4057c581a
56 Ne642874e868b474a9a2e67657e1c6912
57 schema:publisher N290d44784a344bb2ad7c1ac430a0b053
58 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035201050
59 https://doi.org/10.1007/978-3-319-07566-2_19
60 schema:sdDatePublished 2022-05-20T07:47
61 schema:sdLicense https://scigraph.springernature.com/explorer/license/
62 schema:sdPublisher Nb2cbacaea3844599b1b2e326f0159006
63 schema:url https://doi.org/10.1007/978-3-319-07566-2_19
64 sgo:license sg:explorer/license/
65 sgo:sdDataset chapters
66 rdf:type schema:Chapter
67 N290d44784a344bb2ad7c1ac430a0b053 schema:name Springer Nature
68 rdf:type schema:Organisation
69 N2d71594941244c99b2d535145e91d5ab rdf:first N7f80db3c0faa4324a64896b56505a7ff
70 rdf:rest N3b8c1f820c504c4b934bb5e5caeeae68
71 N3b8c1f820c504c4b934bb5e5caeeae68 rdf:first N51e38fa30cd747ae9668dfcc58040a5e
72 rdf:rest rdf:nil
73 N51e38fa30cd747ae9668dfcc58040a5e schema:familyName Pevzner
74 schema:givenName Pavel
75 rdf:type schema:Person
76 N69412d0282134f6582e3c98643763631 schema:familyName Kulikov
77 schema:givenName Alexander S.
78 rdf:type schema:Person
79 N7f80db3c0faa4324a64896b56505a7ff schema:familyName Kuznetsov
80 schema:givenName Sergei O.
81 rdf:type schema:Person
82 Nb2cbacaea3844599b1b2e326f0159006 schema:name Springer Nature - SN SciGraph project
83 rdf:type schema:Organization
84 Nc3468335339340d8964f457bee3c1bae schema:isbn 978-3-319-07565-5
85 978-3-319-07566-2
86 schema:name Combinatorial Pattern Matching
87 rdf:type schema:Book
88 Nc8892c7d1e13462c963f60e4057c581a schema:name doi
89 schema:value 10.1007/978-3-319-07566-2_19
90 rdf:type schema:PropertyValue
91 Nd70b6d5465a34a4a90cdf00b9897df6d rdf:first sg:person.015252371751.76
92 rdf:rest rdf:nil
93 Ne642874e868b474a9a2e67657e1c6912 schema:name dimensions_id
94 schema:value pub.1035201050
95 rdf:type schema:PropertyValue
96 Nfe886c773dc84dcaa15ba62dff17c704 rdf:first N69412d0282134f6582e3c98643763631
97 rdf:rest N2d71594941244c99b2d535145e91d5ab
98 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
99 schema:name Mathematical Sciences
100 rdf:type schema:DefinedTerm
101 anzsrc-for:0101 schema:inDefinedTermSet anzsrc-for:
102 schema:name Pure Mathematics
103 rdf:type schema:DefinedTerm
104 sg:person.015252371751.76 schema:affiliation grid-institutes:grid.8505.8
105 schema:familyName Jeż
106 schema:givenName Artur
107 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015252371751.76
108 rdf:type schema:Person
109 grid-institutes:grid.8505.8 schema:alternateName Institute of Computer Science, University of Wrocław, Poland
110 schema:name Institute of Computer Science, University of Wrocław, Poland
111 Max Planck Institute für Informatik, Saarbrücken, Germany
112 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...