Efficient Algorithms for Regular Expression Constrained Sequence Alignment View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2006

AUTHORS

Yun-Sheng Chung , Chin Lung Lu , Chuan Yi Tang

ABSTRACT

Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O(|Σ|2 |V|4 n 2) and O(|Σ|2 |V|4 n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|3 n 2) time and O(|V|2 n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|2log|V| n 2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ|2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice. More... »

PAGES

389-400

References to SciGraph publications

  • 2005. Regular Expression Constrained Sequence Alignment in COMBINATORIAL PATTERN MATCHING
  • Book

    TITLE

    Combinatorial Pattern Matching

    ISBN

    978-3-540-35455-0
    978-3-540-35461-1

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/11780441_35

    DOI

    http://dx.doi.org/10.1007/11780441_35

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1038390882


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Computation Theory and Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "National Tsing Hua University", 
              "id": "https://www.grid.ac/institutes/grid.38348.34", 
              "name": [
                "Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan, ROC"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chung", 
            "givenName": "Yun-Sheng", 
            "id": "sg:person.014067475277.58", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014067475277.58"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Chiao Tung University", 
              "id": "https://www.grid.ac/institutes/grid.260539.b", 
              "name": [
                "Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, 300, Taiwan, ROC"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lu", 
            "givenName": "Chin Lung", 
            "id": "sg:person.016502771037.22", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016502771037.22"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Tsing Hua University", 
              "id": "https://www.grid.ac/institutes/grid.38348.34", 
              "name": [
                "Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan, ROC"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Tang", 
            "givenName": "Chuan Yi", 
            "id": "sg:person.01312526135.27", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01312526135.27"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/11496656_28", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000307633", 
              "https://doi.org/10.1007/11496656_28"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11496656_28", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000307633", 
              "https://doi.org/10.1007/11496656_28"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/bth220", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007615319"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/360825.360861", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016123406"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/20.1.3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032312483"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/bth468", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035719800"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1142/s0219720003000095", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1063004501"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1142/s0219720005000977", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1063004589"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2006", 
        "datePublishedReg": "2006-01-01", 
        "description": "Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O(|\u03a3|2 |V|4 n 2) and O(|\u03a3|2 |V|4 n), respectively, where \u03a3 is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|3 n 2) time and O(|V|2 n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|2log|V| n 2). The time complexity of our algorithms is independent of \u03a3, which is desirable in protein applications where the formulation of this problem originates; a factor of |\u03a3|2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.", 
        "editor": [
          {
            "familyName": "Lewenstein", 
            "givenName": "Moshe", 
            "type": "Person"
          }, 
          {
            "familyName": "Valiente", 
            "givenName": "Gabriel", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/11780441_35", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": {
          "isbn": [
            "978-3-540-35455-0", 
            "978-3-540-35461-1"
          ], 
          "name": "Combinatorial Pattern Matching", 
          "type": "Book"
        }, 
        "name": "Efficient Algorithms for Regular Expression Constrained Sequence Alignment", 
        "pagination": "389-400", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/11780441_35"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "b6e0d2b1ab4e5269a146ccbd8f233abffbe3966a26d90b7d875b0909678c8d50"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1038390882"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/11780441_35", 
          "https://app.dimensions.ai/details/publication/pub.1038390882"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-15T19:11", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8684_00000267.jsonl", 
        "type": "Chapter", 
        "url": "http://link.springer.com/10.1007/11780441_35"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/11780441_35'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/11780441_35'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/11780441_35'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/11780441_35'


     

    This table displays all metadata directly associated to this object as RDF triples.

    109 TRIPLES      23 PREDICATES      34 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/11780441_35 schema:about anzsrc-for:08
    2 anzsrc-for:0802
    3 schema:author Ne5c924e9590c4fc9990992ca65908f0d
    4 schema:citation sg:pub.10.1007/11496656_28
    5 https://doi.org/10.1093/bioinformatics/bth220
    6 https://doi.org/10.1093/bioinformatics/bth468
    7 https://doi.org/10.1093/nar/20.1.3
    8 https://doi.org/10.1142/s0219720003000095
    9 https://doi.org/10.1142/s0219720005000977
    10 https://doi.org/10.1145/360825.360861
    11 schema:datePublished 2006
    12 schema:datePublishedReg 2006-01-01
    13 schema:description Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O(|Σ|2 |V|4 n 2) and O(|Σ|2 |V|4 n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|3 n 2) time and O(|V|2 n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|2log|V| n 2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ|2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.
    14 schema:editor Na7df7ada3d3a4726b0b0a65058d77f7f
    15 schema:genre chapter
    16 schema:inLanguage en
    17 schema:isAccessibleForFree false
    18 schema:isPartOf Ne4ff05692e6b471c945dca4a2cd556ff
    19 schema:name Efficient Algorithms for Regular Expression Constrained Sequence Alignment
    20 schema:pagination 389-400
    21 schema:productId N198fcec1802d49ddb1cde64f5a036669
    22 N21ac5d1547bf43eab3585e2022169090
    23 Nd55613605be2416198f7b7fee169ab90
    24 schema:publisher N061926eb4cd04fe6aa660d1636d6f9d0
    25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038390882
    26 https://doi.org/10.1007/11780441_35
    27 schema:sdDatePublished 2019-04-15T19:11
    28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    29 schema:sdPublisher Nfd9ef500cf4a49a28743e8031de74b0d
    30 schema:url http://link.springer.com/10.1007/11780441_35
    31 sgo:license sg:explorer/license/
    32 sgo:sdDataset chapters
    33 rdf:type schema:Chapter
    34 N061926eb4cd04fe6aa660d1636d6f9d0 schema:location Berlin, Heidelberg
    35 schema:name Springer Berlin Heidelberg
    36 rdf:type schema:Organisation
    37 N198fcec1802d49ddb1cde64f5a036669 schema:name doi
    38 schema:value 10.1007/11780441_35
    39 rdf:type schema:PropertyValue
    40 N21ac5d1547bf43eab3585e2022169090 schema:name dimensions_id
    41 schema:value pub.1038390882
    42 rdf:type schema:PropertyValue
    43 N7c3b701e86d34497833937198413aae9 rdf:first sg:person.01312526135.27
    44 rdf:rest rdf:nil
    45 N97f37f9b8a534c6c85bb60326c1e8dd3 schema:familyName Valiente
    46 schema:givenName Gabriel
    47 rdf:type schema:Person
    48 N9a5246bd8d2d4222967ffe0c12da2c50 rdf:first sg:person.016502771037.22
    49 rdf:rest N7c3b701e86d34497833937198413aae9
    50 Na7df7ada3d3a4726b0b0a65058d77f7f rdf:first Nc90a1842796f4537a730b295db6c36b7
    51 rdf:rest Nff588b95a0894c46af91be95e9d1b949
    52 Nc90a1842796f4537a730b295db6c36b7 schema:familyName Lewenstein
    53 schema:givenName Moshe
    54 rdf:type schema:Person
    55 Nd55613605be2416198f7b7fee169ab90 schema:name readcube_id
    56 schema:value b6e0d2b1ab4e5269a146ccbd8f233abffbe3966a26d90b7d875b0909678c8d50
    57 rdf:type schema:PropertyValue
    58 Ne4ff05692e6b471c945dca4a2cd556ff schema:isbn 978-3-540-35455-0
    59 978-3-540-35461-1
    60 schema:name Combinatorial Pattern Matching
    61 rdf:type schema:Book
    62 Ne5c924e9590c4fc9990992ca65908f0d rdf:first sg:person.014067475277.58
    63 rdf:rest N9a5246bd8d2d4222967ffe0c12da2c50
    64 Nfd9ef500cf4a49a28743e8031de74b0d schema:name Springer Nature - SN SciGraph project
    65 rdf:type schema:Organization
    66 Nff588b95a0894c46af91be95e9d1b949 rdf:first N97f37f9b8a534c6c85bb60326c1e8dd3
    67 rdf:rest rdf:nil
    68 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    69 schema:name Information and Computing Sciences
    70 rdf:type schema:DefinedTerm
    71 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
    72 schema:name Computation Theory and Mathematics
    73 rdf:type schema:DefinedTerm
    74 sg:person.01312526135.27 schema:affiliation https://www.grid.ac/institutes/grid.38348.34
    75 schema:familyName Tang
    76 schema:givenName Chuan Yi
    77 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01312526135.27
    78 rdf:type schema:Person
    79 sg:person.014067475277.58 schema:affiliation https://www.grid.ac/institutes/grid.38348.34
    80 schema:familyName Chung
    81 schema:givenName Yun-Sheng
    82 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014067475277.58
    83 rdf:type schema:Person
    84 sg:person.016502771037.22 schema:affiliation https://www.grid.ac/institutes/grid.260539.b
    85 schema:familyName Lu
    86 schema:givenName Chin Lung
    87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016502771037.22
    88 rdf:type schema:Person
    89 sg:pub.10.1007/11496656_28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000307633
    90 https://doi.org/10.1007/11496656_28
    91 rdf:type schema:CreativeWork
    92 https://doi.org/10.1093/bioinformatics/bth220 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007615319
    93 rdf:type schema:CreativeWork
    94 https://doi.org/10.1093/bioinformatics/bth468 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035719800
    95 rdf:type schema:CreativeWork
    96 https://doi.org/10.1093/nar/20.1.3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032312483
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.1142/s0219720003000095 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063004501
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.1142/s0219720005000977 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063004589
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.1145/360825.360861 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016123406
    103 rdf:type schema:CreativeWork
    104 https://www.grid.ac/institutes/grid.260539.b schema:alternateName National Chiao Tung University
    105 schema:name Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, 300, Taiwan, ROC
    106 rdf:type schema:Organization
    107 https://www.grid.ac/institutes/grid.38348.34 schema:alternateName National Tsing Hua University
    108 schema:name Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan, ROC
    109 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...