Efficient Algorithms for Regular Expression Constrained Sequence Alignment View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2006

AUTHORS

Yun-Sheng Chung , Chin Lung Lu , Chuan Yi Tang

ABSTRACT

Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O(|Σ|2 |V|4 n 2) and O(|Σ|2 |V|4 n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|3 n 2) time and O(|V|2 n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|2log|V| n 2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ|2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice. More... »

PAGES

389-400

References to SciGraph publications

  • 2005. Regular Expression Constrained Sequence Alignment in COMBINATORIAL PATTERN MATCHING
  • Book

    TITLE

    Combinatorial Pattern Matching

    ISBN

    978-3-540-35455-0
    978-3-540-35461-1

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/11780441_35

    DOI

    http://dx.doi.org/10.1007/11780441_35

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1038390882


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Computation Theory and Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "National Tsing Hua University", 
              "id": "https://www.grid.ac/institutes/grid.38348.34", 
              "name": [
                "Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan, ROC"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chung", 
            "givenName": "Yun-Sheng", 
            "id": "sg:person.014067475277.58", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014067475277.58"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Chiao Tung University", 
              "id": "https://www.grid.ac/institutes/grid.260539.b", 
              "name": [
                "Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, 300, Taiwan, ROC"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lu", 
            "givenName": "Chin Lung", 
            "id": "sg:person.016502771037.22", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016502771037.22"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Tsing Hua University", 
              "id": "https://www.grid.ac/institutes/grid.38348.34", 
              "name": [
                "Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan, ROC"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Tang", 
            "givenName": "Chuan Yi", 
            "id": "sg:person.01312526135.27", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01312526135.27"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/11496656_28", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000307633", 
              "https://doi.org/10.1007/11496656_28"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11496656_28", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000307633", 
              "https://doi.org/10.1007/11496656_28"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/bth220", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007615319"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/360825.360861", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016123406"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/20.1.3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032312483"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/bth468", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035719800"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1142/s0219720003000095", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1063004501"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1142/s0219720005000977", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1063004589"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2006", 
        "datePublishedReg": "2006-01-01", 
        "description": "Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O(|\u03a3|2 |V|4 n 2) and O(|\u03a3|2 |V|4 n), respectively, where \u03a3 is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|3 n 2) time and O(|V|2 n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|2log|V| n 2). The time complexity of our algorithms is independent of \u03a3, which is desirable in protein applications where the formulation of this problem originates; a factor of |\u03a3|2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.", 
        "editor": [
          {
            "familyName": "Lewenstein", 
            "givenName": "Moshe", 
            "type": "Person"
          }, 
          {
            "familyName": "Valiente", 
            "givenName": "Gabriel", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/11780441_35", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": {
          "isbn": [
            "978-3-540-35455-0", 
            "978-3-540-35461-1"
          ], 
          "name": "Combinatorial Pattern Matching", 
          "type": "Book"
        }, 
        "name": "Efficient Algorithms for Regular Expression Constrained Sequence Alignment", 
        "pagination": "389-400", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/11780441_35"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "b6e0d2b1ab4e5269a146ccbd8f233abffbe3966a26d90b7d875b0909678c8d50"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1038390882"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/11780441_35", 
          "https://app.dimensions.ai/details/publication/pub.1038390882"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-15T19:11", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8684_00000267.jsonl", 
        "type": "Chapter", 
        "url": "http://link.springer.com/10.1007/11780441_35"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/11780441_35'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/11780441_35'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/11780441_35'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/11780441_35'


     

    This table displays all metadata directly associated to this object as RDF triples.

    109 TRIPLES      23 PREDICATES      34 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/11780441_35 schema:about anzsrc-for:08
    2 anzsrc-for:0802
    3 schema:author N9e0d09263d244498816525dc0b34da39
    4 schema:citation sg:pub.10.1007/11496656_28
    5 https://doi.org/10.1093/bioinformatics/bth220
    6 https://doi.org/10.1093/bioinformatics/bth468
    7 https://doi.org/10.1093/nar/20.1.3
    8 https://doi.org/10.1142/s0219720003000095
    9 https://doi.org/10.1142/s0219720005000977
    10 https://doi.org/10.1145/360825.360861
    11 schema:datePublished 2006
    12 schema:datePublishedReg 2006-01-01
    13 schema:description Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O(|Σ|2 |V|4 n 2) and O(|Σ|2 |V|4 n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|3 n 2) time and O(|V|2 n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|2log|V| n 2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ|2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.
    14 schema:editor Nce436b721f6a470cbef76aca9556b1fb
    15 schema:genre chapter
    16 schema:inLanguage en
    17 schema:isAccessibleForFree false
    18 schema:isPartOf Nd8921cfb3fde460d92e3dbd3d60bdc60
    19 schema:name Efficient Algorithms for Regular Expression Constrained Sequence Alignment
    20 schema:pagination 389-400
    21 schema:productId N5540954dbc534ec1ae0616cd7e20b8ee
    22 N8d98fd493aa049b1a75bf68bae7a1605
    23 Nead5dc246caf4f70a5d06a46f4619f84
    24 schema:publisher N834e62b8ab9045449fe41c0f41a0a269
    25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038390882
    26 https://doi.org/10.1007/11780441_35
    27 schema:sdDatePublished 2019-04-15T19:11
    28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    29 schema:sdPublisher N728aab82f9c64cdc8b3822798c0e00f0
    30 schema:url http://link.springer.com/10.1007/11780441_35
    31 sgo:license sg:explorer/license/
    32 sgo:sdDataset chapters
    33 rdf:type schema:Chapter
    34 N454f427743f24f4992417c29502cab38 rdf:first sg:person.016502771037.22
    35 rdf:rest N9e463ab06e62407e9276ca2e2c1c6e79
    36 N4b2aa5535d284e7e8bd6f732852ed4cc rdf:first Nd2f7b5f612eb40ef861e691446f323ba
    37 rdf:rest rdf:nil
    38 N5540954dbc534ec1ae0616cd7e20b8ee schema:name doi
    39 schema:value 10.1007/11780441_35
    40 rdf:type schema:PropertyValue
    41 N728aab82f9c64cdc8b3822798c0e00f0 schema:name Springer Nature - SN SciGraph project
    42 rdf:type schema:Organization
    43 N75f2bda6b1a443988f90dc52fae46654 schema:familyName Lewenstein
    44 schema:givenName Moshe
    45 rdf:type schema:Person
    46 N834e62b8ab9045449fe41c0f41a0a269 schema:location Berlin, Heidelberg
    47 schema:name Springer Berlin Heidelberg
    48 rdf:type schema:Organisation
    49 N8d98fd493aa049b1a75bf68bae7a1605 schema:name readcube_id
    50 schema:value b6e0d2b1ab4e5269a146ccbd8f233abffbe3966a26d90b7d875b0909678c8d50
    51 rdf:type schema:PropertyValue
    52 N9e0d09263d244498816525dc0b34da39 rdf:first sg:person.014067475277.58
    53 rdf:rest N454f427743f24f4992417c29502cab38
    54 N9e463ab06e62407e9276ca2e2c1c6e79 rdf:first sg:person.01312526135.27
    55 rdf:rest rdf:nil
    56 Nce436b721f6a470cbef76aca9556b1fb rdf:first N75f2bda6b1a443988f90dc52fae46654
    57 rdf:rest N4b2aa5535d284e7e8bd6f732852ed4cc
    58 Nd2f7b5f612eb40ef861e691446f323ba schema:familyName Valiente
    59 schema:givenName Gabriel
    60 rdf:type schema:Person
    61 Nd8921cfb3fde460d92e3dbd3d60bdc60 schema:isbn 978-3-540-35455-0
    62 978-3-540-35461-1
    63 schema:name Combinatorial Pattern Matching
    64 rdf:type schema:Book
    65 Nead5dc246caf4f70a5d06a46f4619f84 schema:name dimensions_id
    66 schema:value pub.1038390882
    67 rdf:type schema:PropertyValue
    68 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    69 schema:name Information and Computing Sciences
    70 rdf:type schema:DefinedTerm
    71 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
    72 schema:name Computation Theory and Mathematics
    73 rdf:type schema:DefinedTerm
    74 sg:person.01312526135.27 schema:affiliation https://www.grid.ac/institutes/grid.38348.34
    75 schema:familyName Tang
    76 schema:givenName Chuan Yi
    77 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01312526135.27
    78 rdf:type schema:Person
    79 sg:person.014067475277.58 schema:affiliation https://www.grid.ac/institutes/grid.38348.34
    80 schema:familyName Chung
    81 schema:givenName Yun-Sheng
    82 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014067475277.58
    83 rdf:type schema:Person
    84 sg:person.016502771037.22 schema:affiliation https://www.grid.ac/institutes/grid.260539.b
    85 schema:familyName Lu
    86 schema:givenName Chin Lung
    87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016502771037.22
    88 rdf:type schema:Person
    89 sg:pub.10.1007/11496656_28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000307633
    90 https://doi.org/10.1007/11496656_28
    91 rdf:type schema:CreativeWork
    92 https://doi.org/10.1093/bioinformatics/bth220 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007615319
    93 rdf:type schema:CreativeWork
    94 https://doi.org/10.1093/bioinformatics/bth468 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035719800
    95 rdf:type schema:CreativeWork
    96 https://doi.org/10.1093/nar/20.1.3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032312483
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.1142/s0219720003000095 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063004501
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.1142/s0219720005000977 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063004589
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.1145/360825.360861 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016123406
    103 rdf:type schema:CreativeWork
    104 https://www.grid.ac/institutes/grid.260539.b schema:alternateName National Chiao Tung University
    105 schema:name Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, 300, Taiwan, ROC
    106 rdf:type schema:Organization
    107 https://www.grid.ac/institutes/grid.38348.34 schema:alternateName National Tsing Hua University
    108 schema:name Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan, ROC
    109 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...