A Polynomial Time Approximation Scheme for the Closest Substring Problem View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2002-11-07

AUTHORS

Bin Ma

ABSTRACT

In this paper we study the following problem: Given n strings s1, s2,..., sn, each of length m, find a substring ti of length L for each si, and a string s of length L, such that maxi = 1nd(s, ti) is minimized, where d(·, ·) is the Hamming distance. The problem was raised in [6] in an application of genetic drug target search and is a key open problem in many applications [7]. The authors of [6] showed that it is NP-hard and can be trivially approximated within ratio 2. A non-trivial approximation algorithm with ratio better than 2 was found in [7]. A major open question in this area is whether there exists a polynomial time approximation scheme (PTAS) for this problem. In this paper, we answer this question positively. We also apply our method to two related problems. More... »

PAGES

99-107

References to SciGraph publications

  • 1997. Banishing bias from consensus sequences in COMBINATORIAL PATTERN MATCHING
  • 1997-04. On covering problems of codes in THEORY OF COMPUTING SYSTEMS
  • Book

    TITLE

    Combinatorial Pattern Matching

    ISBN

    978-3-540-67633-1
    978-3-540-45123-5

    Author Affiliations

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/3-540-45123-4_10

    DOI

    http://dx.doi.org/10.1007/3-540-45123-4_10

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1007323304


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Computation Theory and Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Waterloo", 
              "id": "https://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "Department of Computer Science, University of Waterloo, N2L3G1, Waterloo, ON, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Ma", 
            "givenName": "Bin", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1145/301250.301376", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002920963"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf02679443", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037262625", 
              "https://doi.org/10.1007/bf02679443"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf02679443", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037262625", 
              "https://doi.org/10.1007/bf02679443"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-63220-4_63", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1040283480", 
              "https://doi.org/10.1007/3-540-63220-4_63"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1017/cbo9780511814075", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1098701235"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2002-11-07", 
        "datePublishedReg": "2002-11-07", 
        "description": "In this paper we study the following problem: Given n strings s1, s2,..., sn, each of length m, find a substring ti of length L for each si, and a string s of length L, such that maxi = 1nd(s, ti) is minimized, where d(\u00b7, \u00b7) is the Hamming distance. The problem was raised in [6] in an application of genetic drug target search and is a key open problem in many applications [7]. The authors of [6] showed that it is NP-hard and can be trivially approximated within ratio 2. A non-trivial approximation algorithm with ratio better than 2 was found in [7]. A major open question in this area is whether there exists a polynomial time approximation scheme (PTAS) for this problem. In this paper, we answer this question positively. We also apply our method to two related problems.", 
        "editor": [
          {
            "familyName": "Giancarlo", 
            "givenName": "Raffaele", 
            "type": "Person"
          }, 
          {
            "familyName": "Sankoff", 
            "givenName": "David", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/3-540-45123-4_10", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-540-67633-1", 
            "978-3-540-45123-5"
          ], 
          "name": "Combinatorial Pattern Matching", 
          "type": "Book"
        }, 
        "name": "A Polynomial Time Approximation Scheme for the Closest Substring Problem", 
        "pagination": "99-107", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/3-540-45123-4_10"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "22e230fd0b49b62462c5bad0fdecc1f37a6ee1169e7dd77a74acc59b2f8a57c0"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1007323304"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/3-540-45123-4_10", 
          "https://app.dimensions.ai/details/publication/pub.1007323304"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T05:47", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000347_0000000347/records_89814_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F3-540-45123-4_10"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-45123-4_10'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-45123-4_10'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-45123-4_10'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-45123-4_10'


     

    This table displays all metadata directly associated to this object as RDF triples.

    83 TRIPLES      23 PREDICATES      30 URIs      19 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/3-540-45123-4_10 schema:about anzsrc-for:08
    2 anzsrc-for:0802
    3 schema:author Nd7928911cd534cc28f789aaa1d27eb1a
    4 schema:citation sg:pub.10.1007/3-540-63220-4_63
    5 sg:pub.10.1007/bf02679443
    6 https://doi.org/10.1017/cbo9780511814075
    7 https://doi.org/10.1145/301250.301376
    8 schema:datePublished 2002-11-07
    9 schema:datePublishedReg 2002-11-07
    10 schema:description In this paper we study the following problem: Given n strings s1, s2,..., sn, each of length m, find a substring ti of length L for each si, and a string s of length L, such that maxi = 1nd(s, ti) is minimized, where d(·, ·) is the Hamming distance. The problem was raised in [6] in an application of genetic drug target search and is a key open problem in many applications [7]. The authors of [6] showed that it is NP-hard and can be trivially approximated within ratio 2. A non-trivial approximation algorithm with ratio better than 2 was found in [7]. A major open question in this area is whether there exists a polynomial time approximation scheme (PTAS) for this problem. In this paper, we answer this question positively. We also apply our method to two related problems.
    11 schema:editor N50092f25dab5490e9e1e8d3a37e20bd2
    12 schema:genre chapter
    13 schema:inLanguage en
    14 schema:isAccessibleForFree true
    15 schema:isPartOf N816b97f39d3b4f42b8b822c65e0f8f5d
    16 schema:name A Polynomial Time Approximation Scheme for the Closest Substring Problem
    17 schema:pagination 99-107
    18 schema:productId N310273dca8d34df2afacddc038c908f1
    19 N86e0f5d105e44d3381ef27580011e6bc
    20 Na88307bcc6454d3e88cea9dda33cf046
    21 schema:publisher Nc060a64e14b847e7952d6fe736e71392
    22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007323304
    23 https://doi.org/10.1007/3-540-45123-4_10
    24 schema:sdDatePublished 2019-04-16T05:47
    25 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    26 schema:sdPublisher N89d235e5b97a49abb70c65790d40b2f7
    27 schema:url https://link.springer.com/10.1007%2F3-540-45123-4_10
    28 sgo:license sg:explorer/license/
    29 sgo:sdDataset chapters
    30 rdf:type schema:Chapter
    31 N1a926e68f4284620a1c21e68ca6899e7 schema:familyName Sankoff
    32 schema:givenName David
    33 rdf:type schema:Person
    34 N2a1295f015a7477783a30d6ac4ca0f8a schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
    35 schema:familyName Ma
    36 schema:givenName Bin
    37 rdf:type schema:Person
    38 N310273dca8d34df2afacddc038c908f1 schema:name readcube_id
    39 schema:value 22e230fd0b49b62462c5bad0fdecc1f37a6ee1169e7dd77a74acc59b2f8a57c0
    40 rdf:type schema:PropertyValue
    41 N50092f25dab5490e9e1e8d3a37e20bd2 rdf:first N7b0d6e488a8c4c2698e6ada101448d17
    42 rdf:rest N66f33b0f51ca43b892a011c76868fd81
    43 N66f33b0f51ca43b892a011c76868fd81 rdf:first N1a926e68f4284620a1c21e68ca6899e7
    44 rdf:rest rdf:nil
    45 N7b0d6e488a8c4c2698e6ada101448d17 schema:familyName Giancarlo
    46 schema:givenName Raffaele
    47 rdf:type schema:Person
    48 N816b97f39d3b4f42b8b822c65e0f8f5d schema:isbn 978-3-540-45123-5
    49 978-3-540-67633-1
    50 schema:name Combinatorial Pattern Matching
    51 rdf:type schema:Book
    52 N86e0f5d105e44d3381ef27580011e6bc schema:name dimensions_id
    53 schema:value pub.1007323304
    54 rdf:type schema:PropertyValue
    55 N89d235e5b97a49abb70c65790d40b2f7 schema:name Springer Nature - SN SciGraph project
    56 rdf:type schema:Organization
    57 Na88307bcc6454d3e88cea9dda33cf046 schema:name doi
    58 schema:value 10.1007/3-540-45123-4_10
    59 rdf:type schema:PropertyValue
    60 Nc060a64e14b847e7952d6fe736e71392 schema:location Berlin, Heidelberg
    61 schema:name Springer Berlin Heidelberg
    62 rdf:type schema:Organisation
    63 Nd7928911cd534cc28f789aaa1d27eb1a rdf:first N2a1295f015a7477783a30d6ac4ca0f8a
    64 rdf:rest rdf:nil
    65 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    66 schema:name Information and Computing Sciences
    67 rdf:type schema:DefinedTerm
    68 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
    69 schema:name Computation Theory and Mathematics
    70 rdf:type schema:DefinedTerm
    71 sg:pub.10.1007/3-540-63220-4_63 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040283480
    72 https://doi.org/10.1007/3-540-63220-4_63
    73 rdf:type schema:CreativeWork
    74 sg:pub.10.1007/bf02679443 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037262625
    75 https://doi.org/10.1007/bf02679443
    76 rdf:type schema:CreativeWork
    77 https://doi.org/10.1017/cbo9780511814075 schema:sameAs https://app.dimensions.ai/details/publication/pub.1098701235
    78 rdf:type schema:CreativeWork
    79 https://doi.org/10.1145/301250.301376 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002920963
    80 rdf:type schema:CreativeWork
    81 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
    82 schema:name Department of Computer Science, University of Waterloo, N2L3G1, Waterloo, ON, Canada
    83 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...