New Bounds for Motif Finding in Strong Instances View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2006

AUTHORS

Broňa Brejová , Daniel G. Brown , Ian M. Harrower , Tomáš Vinař

ABSTRACT

Many algorithms for motif finding that are commonly used in bioinformatics start by sampling r potential motif occurrences from n input sequences. The motif is derived from these samples and evaluated on all sequences. This approach works extremely well in practice, and is implemented by several programs. Li, Ma and Wang have shown that a simple algorithm of this sort is a polynomial-time approximation scheme. However, in 2005, we showed specific instances of the motif finding problem for which the approximation ratio of a slight variation of this scheme converges to one very slowly as a function of the sample size r, which seemingly contradicts the high performance of sample-based algorithms. Here, we account for the difference by showing that, for a variety of different definitions of “strong” binary motifs, the approximation ratio of sample-based algorithms converges to one exponentially fast in r. We also describe “very strong” motifs, for which the simple sample-based approach always identifies the correct motif, even for modest values of r. More... »

PAGES

94-105

References to SciGraph publications

  • 1998. Concentration in PROBABILISTIC METHODS FOR ALGORITHMIC DISCRETE MATHEMATICS
  • 2005. Sharper Upper and Lower Bounds for an Approximation Scheme for Consensus-Pattern in COMBINATORIAL PATTERN MATCHING
  • Book

    TITLE

    Combinatorial Pattern Matching

    ISBN

    978-3-540-35455-0
    978-3-540-35461-1

    Author Affiliations

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/11780441_10

    DOI

    http://dx.doi.org/10.1007/11780441_10

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1018752012


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Computation Theory and Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Waterloo", 
              "id": "https://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Brejov\u00e1", 
            "givenName": "Bro\u0148a", 
            "id": "sg:person.0642141060.90", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642141060.90"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Waterloo", 
              "id": "https://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Brown", 
            "givenName": "Daniel G.", 
            "id": "sg:person.0642727740.54", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Waterloo", 
              "id": "https://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Harrower", 
            "givenName": "Ian M.", 
            "id": "sg:person.01203526716.95", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01203526716.95"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Waterloo", 
              "id": "https://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Vina\u0159", 
            "givenName": "Tom\u00e1\u0161", 
            "id": "sg:person.01041305251.67", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01041305251.67"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-3-662-12788-9_6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008304466", 
              "https://doi.org/10.1007/978-3-662-12788-9_6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1006/jcss.2002.1823", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1024001192"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/15.7.563", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030639875"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11496656_1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031528905", 
              "https://doi.org/10.1007/11496656_1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11496656_1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031528905", 
              "https://doi.org/10.1007/11496656_1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1137/s0097539793250767", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1062879833"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2006", 
        "datePublishedReg": "2006-01-01", 
        "description": "Many algorithms for motif finding that are commonly used in bioinformatics start by sampling r potential motif occurrences from n input sequences. The motif is derived from these samples and evaluated on all sequences. This approach works extremely well in practice, and is implemented by several programs. Li, Ma and Wang have shown that a simple algorithm of this sort is a polynomial-time approximation scheme. However, in 2005, we showed specific instances of the motif finding problem for which the approximation ratio of a slight variation of this scheme converges to one very slowly as a function of the sample size r, which seemingly contradicts the high performance of sample-based algorithms. Here, we account for the difference by showing that, for a variety of different definitions of \u201cstrong\u201d binary motifs, the approximation ratio of sample-based algorithms converges to one exponentially fast in r. We also describe \u201cvery strong\u201d motifs, for which the simple sample-based approach always identifies the correct motif, even for modest values of r.", 
        "editor": [
          {
            "familyName": "Lewenstein", 
            "givenName": "Moshe", 
            "type": "Person"
          }, 
          {
            "familyName": "Valiente", 
            "givenName": "Gabriel", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/11780441_10", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-540-35455-0", 
            "978-3-540-35461-1"
          ], 
          "name": "Combinatorial Pattern Matching", 
          "type": "Book"
        }, 
        "name": "New Bounds for Motif Finding in Strong Instances", 
        "pagination": "94-105", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1018752012"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/11780441_10"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "5958c8f01f66e4586a5377a22c513ccdfdff18d84965807309110982a6d49f4a"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/11780441_10", 
          "https://app.dimensions.ai/details/publication/pub.1018752012"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T07:30", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000356_0000000356/records_57874_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F11780441_10"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/11780441_10'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/11780441_10'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/11780441_10'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/11780441_10'


     

    This table displays all metadata directly associated to this object as RDF triples.

    108 TRIPLES      23 PREDICATES      32 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/11780441_10 schema:about anzsrc-for:08
    2 anzsrc-for:0802
    3 schema:author Nb0001351df054f2ca699345a922aada2
    4 schema:citation sg:pub.10.1007/11496656_1
    5 sg:pub.10.1007/978-3-662-12788-9_6
    6 https://doi.org/10.1006/jcss.2002.1823
    7 https://doi.org/10.1093/bioinformatics/15.7.563
    8 https://doi.org/10.1137/s0097539793250767
    9 schema:datePublished 2006
    10 schema:datePublishedReg 2006-01-01
    11 schema:description Many algorithms for motif finding that are commonly used in bioinformatics start by sampling r potential motif occurrences from n input sequences. The motif is derived from these samples and evaluated on all sequences. This approach works extremely well in practice, and is implemented by several programs. Li, Ma and Wang have shown that a simple algorithm of this sort is a polynomial-time approximation scheme. However, in 2005, we showed specific instances of the motif finding problem for which the approximation ratio of a slight variation of this scheme converges to one very slowly as a function of the sample size r, which seemingly contradicts the high performance of sample-based algorithms. Here, we account for the difference by showing that, for a variety of different definitions of “strong” binary motifs, the approximation ratio of sample-based algorithms converges to one exponentially fast in r. We also describe “very strong” motifs, for which the simple sample-based approach always identifies the correct motif, even for modest values of r.
    12 schema:editor N6bb9713c202a42eda2186e98cfdf610f
    13 schema:genre chapter
    14 schema:inLanguage en
    15 schema:isAccessibleForFree true
    16 schema:isPartOf N241cac2285a0411aa4865614eb20f90a
    17 schema:name New Bounds for Motif Finding in Strong Instances
    18 schema:pagination 94-105
    19 schema:productId N3825de9e8e6e4f789537253ecf24bab1
    20 Na6aca072c57a49f997836515b4611bcc
    21 Nbffcb5604231452c8cd39cb20fefaf1a
    22 schema:publisher N7740fb9b70054977b08fc9496ea307e9
    23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018752012
    24 https://doi.org/10.1007/11780441_10
    25 schema:sdDatePublished 2019-04-16T07:30
    26 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    27 schema:sdPublisher N24b564576b6a465b90513b6042078077
    28 schema:url https://link.springer.com/10.1007%2F11780441_10
    29 sgo:license sg:explorer/license/
    30 sgo:sdDataset chapters
    31 rdf:type schema:Chapter
    32 N241cac2285a0411aa4865614eb20f90a schema:isbn 978-3-540-35455-0
    33 978-3-540-35461-1
    34 schema:name Combinatorial Pattern Matching
    35 rdf:type schema:Book
    36 N24b564576b6a465b90513b6042078077 schema:name Springer Nature - SN SciGraph project
    37 rdf:type schema:Organization
    38 N27ff1f53e0c14eba99128192490d5d8a schema:familyName Valiente
    39 schema:givenName Gabriel
    40 rdf:type schema:Person
    41 N3825de9e8e6e4f789537253ecf24bab1 schema:name readcube_id
    42 schema:value 5958c8f01f66e4586a5377a22c513ccdfdff18d84965807309110982a6d49f4a
    43 rdf:type schema:PropertyValue
    44 N3c62bee3a9a743e8be5e72b44f2bc6f9 rdf:first sg:person.0642727740.54
    45 rdf:rest Nda835f43729b4fe98c047b67afcd2482
    46 N5e6c69a6851a4f9798a8e263c013a0fa rdf:first N27ff1f53e0c14eba99128192490d5d8a
    47 rdf:rest rdf:nil
    48 N6bb9713c202a42eda2186e98cfdf610f rdf:first Nec40d03a8bcd49918a34195d75bfaa15
    49 rdf:rest N5e6c69a6851a4f9798a8e263c013a0fa
    50 N6f648ffdadff4c9eb97666d157a48222 rdf:first sg:person.01041305251.67
    51 rdf:rest rdf:nil
    52 N7740fb9b70054977b08fc9496ea307e9 schema:location Berlin, Heidelberg
    53 schema:name Springer Berlin Heidelberg
    54 rdf:type schema:Organisation
    55 Na6aca072c57a49f997836515b4611bcc schema:name dimensions_id
    56 schema:value pub.1018752012
    57 rdf:type schema:PropertyValue
    58 Nb0001351df054f2ca699345a922aada2 rdf:first sg:person.0642141060.90
    59 rdf:rest N3c62bee3a9a743e8be5e72b44f2bc6f9
    60 Nbffcb5604231452c8cd39cb20fefaf1a schema:name doi
    61 schema:value 10.1007/11780441_10
    62 rdf:type schema:PropertyValue
    63 Nda835f43729b4fe98c047b67afcd2482 rdf:first sg:person.01203526716.95
    64 rdf:rest N6f648ffdadff4c9eb97666d157a48222
    65 Nec40d03a8bcd49918a34195d75bfaa15 schema:familyName Lewenstein
    66 schema:givenName Moshe
    67 rdf:type schema:Person
    68 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    69 schema:name Information and Computing Sciences
    70 rdf:type schema:DefinedTerm
    71 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
    72 schema:name Computation Theory and Mathematics
    73 rdf:type schema:DefinedTerm
    74 sg:person.01041305251.67 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
    75 schema:familyName Vinař
    76 schema:givenName Tomáš
    77 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01041305251.67
    78 rdf:type schema:Person
    79 sg:person.01203526716.95 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
    80 schema:familyName Harrower
    81 schema:givenName Ian M.
    82 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01203526716.95
    83 rdf:type schema:Person
    84 sg:person.0642141060.90 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
    85 schema:familyName Brejová
    86 schema:givenName Broňa
    87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642141060.90
    88 rdf:type schema:Person
    89 sg:person.0642727740.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
    90 schema:familyName Brown
    91 schema:givenName Daniel G.
    92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
    93 rdf:type schema:Person
    94 sg:pub.10.1007/11496656_1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031528905
    95 https://doi.org/10.1007/11496656_1
    96 rdf:type schema:CreativeWork
    97 sg:pub.10.1007/978-3-662-12788-9_6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008304466
    98 https://doi.org/10.1007/978-3-662-12788-9_6
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.1006/jcss.2002.1823 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024001192
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.1093/bioinformatics/15.7.563 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030639875
    103 rdf:type schema:CreativeWork
    104 https://doi.org/10.1137/s0097539793250767 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062879833
    105 rdf:type schema:CreativeWork
    106 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
    107 schema:name David R. Cheriton School of Computer Science, University of Waterloo
    108 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...