RefSelect: a reference sequence selection algorithm for planted (l, d) motif search View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2016-07

AUTHORS

Qiang Yu, Hongwei Huo, Ruixing Zhao, Dazheng Feng, Jeffrey Scott Vitter, Jun Huan

ABSTRACT

BACKGROUND: The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. RESULTS: In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences. CONCLUSIONS: The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large data sets. More... »

PAGES

266

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s12859-016-1130-6

DOI

http://dx.doi.org/10.1186/s12859-016-1130-6

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1026133554

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/27454113


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Computation Theory and Mathematics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Amino Acid Motifs", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Protein Domains", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Proteins", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, Protein", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Xidian University", 
          "id": "https://www.grid.ac/institutes/grid.440736.2", 
          "name": [
            "School of Computer Science and Technology, Xidian University, 710071, Xi\u2019an, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yu", 
        "givenName": "Qiang", 
        "id": "sg:person.01031255322.32", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01031255322.32"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Xidian University", 
          "id": "https://www.grid.ac/institutes/grid.440736.2", 
          "name": [
            "School of Computer Science and Technology, Xidian University, 710071, Xi\u2019an, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Huo", 
        "givenName": "Hongwei", 
        "id": "sg:person.01077370522.90", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01077370522.90"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Xidian University", 
          "id": "https://www.grid.ac/institutes/grid.440736.2", 
          "name": [
            "School of Computer Science and Technology, Xidian University, 710071, Xi\u2019an, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zhao", 
        "givenName": "Ruixing", 
        "id": "sg:person.011525772565.53", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011525772565.53"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Xidian University", 
          "id": "https://www.grid.ac/institutes/grid.440736.2", 
          "name": [
            "School of Electronic Engineering, Xidian University, 710071, Xi\u2019an, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Feng", 
        "givenName": "Dazheng", 
        "id": "sg:person.010740564053.16", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010740564053.16"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Mississippi", 
          "id": "https://www.grid.ac/institutes/grid.251313.7", 
          "name": [
            "Department of Computer and Information Science, The University of Mississippi, 38677-1848, Oxford, MS, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Vitter", 
        "givenName": "Jeffrey Scott", 
        "id": "sg:person.0613677314.28", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0613677314.28"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Kansas", 
          "id": "https://www.grid.ac/institutes/grid.266515.3", 
          "name": [
            "Department of Electrical Engineering and Computer Science, the University of Kansas, 66045, Lawrence, KS, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Huan", 
        "givenName": "Jun", 
        "id": "sg:person.016220372623.74", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016220372623.74"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/bioinformatics/btu093", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002175031"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-12-410", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004305251", 
          "https://doi.org/10.1186/1471-2105-12-410"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-8-385", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004490779", 
          "https://doi.org/10.1186/1471-2105-8-385"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11758525_110", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005767417", 
          "https://doi.org/10.1007/11758525_110"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11758525_110", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005767417", 
          "https://doi.org/10.1007/11758525_110"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-7-488", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007199178", 
          "https://doi.org/10.1186/1471-2105-7-488"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-7-488", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007199178", 
          "https://doi.org/10.1186/1471-2105-7-488"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btr189", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008669948"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nprot.2006.98", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010562203", 
          "https://doi.org/10.1038/nprot.2006.98"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nprot.2006.98", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010562203", 
          "https://doi.org/10.1038/nprot.2006.98"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0048442", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012261360"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1748-7188-4-14", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012312142", 
          "https://doi.org/10.1186/1748-7188-4-14"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkl198", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013590568"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-11-s11-s8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015484700", 
          "https://doi.org/10.1186/1471-2105-11-s11-s8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt0806-959", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015976767", 
          "https://doi.org/10.1038/nbt0806-959"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt0806-959", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015976767", 
          "https://doi.org/10.1038/nbt0806-959"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/369133.369172", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018080948"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/369133.369172", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018080948"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-8-s7-s21", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020592215", 
          "https://doi.org/10.1186/1471-2105-8-s7-s21"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bib/bbs016", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022400077"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/srep07813", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025174963", 
          "https://doi.org/10.1038/srep07813"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0041425", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026111129"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt1053", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030939237", 
          "https://doi.org/10.1038/nbt1053"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt1053", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030939237", 
          "https://doi.org/10.1038/nbt1053"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-6-277", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039086700", 
          "https://doi.org/10.1186/1471-2105-6-277"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-6-277", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039086700", 
          "https://doi.org/10.1186/1471-2105-6-277"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0304-3975(03)00320-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042161086"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0304-3975(03)00320-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042161086"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth0807-613", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051261645", 
          "https://doi.org/10.1038/nmeth0807-613"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-15-34", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051407096", 
          "https://doi.org/10.1186/1471-2105-15-34"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/99.660313", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061251961"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tcbb.2007.70241", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061540584"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tcbb.2014.2306842", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061541242"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tcbb.2014.2361668", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061541344"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1126/science.8211139", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1062653653"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1074695300", 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1077006673", 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1082424111", 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/bibm.2015.7359745", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095677901"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/isss.2002.1227161", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1098825860"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2016-07", 
    "datePublishedReg": "2016-07-01", 
    "description": "BACKGROUND: The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets.\nRESULTS: In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100\u00d7 on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences.\nCONCLUSIONS: The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large data sets.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/s12859-016-1130-6", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.7017787", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "Suppl 9", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "17"
      }
    ], 
    "name": "RefSelect: a reference sequence selection algorithm for planted (l, d) motif search", 
    "pagination": "266", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "7c7fee4cc86bc17570d2da299787d7c1da9ba645643e5eed31ac2315104dace9"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "27454113"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s12859-016-1130-6"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1026133554"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s12859-016-1130-6", 
      "https://app.dimensions.ai/details/publication/pub.1026133554"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T12:21", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000362_0000000362/records_87082_00000000.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1186%2Fs12859-016-1130-6"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12859-016-1130-6'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12859-016-1130-6'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12859-016-1130-6'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12859-016-1130-6'


 

This table displays all metadata directly associated to this object as RDF triples.

247 TRIPLES      21 PREDICATES      68 URIs      28 LITERALS      16 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s12859-016-1130-6 schema:about N37a0b543e166426bb7fd005030ef90cd
2 N407eca5178294e58b773c35c04e4551b
3 N49fd038f20f448b687ed6d2c3bb300b9
4 N530e11039bb44350912205a8bcbae726
5 N7f833a5d07bc4ebe9c0322a8e5be1e6f
6 Nc9f68e55bf144cc8b467673caacf5cdb
7 Ne608597ba8194e2b84f3c164ae77bfba
8 anzsrc-for:08
9 anzsrc-for:0802
10 schema:author N2987c783dd60456e9bddf8df7ba987f8
11 schema:citation sg:pub.10.1007/11758525_110
12 sg:pub.10.1038/nbt0806-959
13 sg:pub.10.1038/nbt1053
14 sg:pub.10.1038/nmeth0807-613
15 sg:pub.10.1038/nprot.2006.98
16 sg:pub.10.1038/srep07813
17 sg:pub.10.1186/1471-2105-11-s11-s8
18 sg:pub.10.1186/1471-2105-12-410
19 sg:pub.10.1186/1471-2105-15-34
20 sg:pub.10.1186/1471-2105-6-277
21 sg:pub.10.1186/1471-2105-7-488
22 sg:pub.10.1186/1471-2105-8-385
23 sg:pub.10.1186/1471-2105-8-s7-s21
24 sg:pub.10.1186/1748-7188-4-14
25 https://app.dimensions.ai/details/publication/pub.1074695300
26 https://app.dimensions.ai/details/publication/pub.1077006673
27 https://app.dimensions.ai/details/publication/pub.1082424111
28 https://doi.org/10.1016/s0304-3975(03)00320-7
29 https://doi.org/10.1093/bib/bbs016
30 https://doi.org/10.1093/bioinformatics/btr189
31 https://doi.org/10.1093/bioinformatics/btu093
32 https://doi.org/10.1093/nar/gkl198
33 https://doi.org/10.1109/99.660313
34 https://doi.org/10.1109/bibm.2015.7359745
35 https://doi.org/10.1109/isss.2002.1227161
36 https://doi.org/10.1109/tcbb.2007.70241
37 https://doi.org/10.1109/tcbb.2014.2306842
38 https://doi.org/10.1109/tcbb.2014.2361668
39 https://doi.org/10.1126/science.8211139
40 https://doi.org/10.1145/369133.369172
41 https://doi.org/10.1371/journal.pone.0041425
42 https://doi.org/10.1371/journal.pone.0048442
43 schema:datePublished 2016-07
44 schema:datePublishedReg 2016-07-01
45 schema:description BACKGROUND: The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. RESULTS: In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences. CONCLUSIONS: The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large data sets.
46 schema:genre research_article
47 schema:inLanguage en
48 schema:isAccessibleForFree true
49 schema:isPartOf N454bb4a2b14548fe92fa54295d309db8
50 Na4a858d0cd6e4c1a8f075c4e2ed3609c
51 sg:journal.1023786
52 schema:name RefSelect: a reference sequence selection algorithm for planted (l, d) motif search
53 schema:pagination 266
54 schema:productId Nafa07d533b4d410ca0af935950905f41
55 Nce14dcb3f130400fb236da627d90951a
56 Ndf311826fdf64f58b8346ad65593d772
57 Ne36bbf6318a547cb8bea6058f7b7571f
58 Nf6e41f6c6cdd4c18bc116acd362f16bf
59 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026133554
60 https://doi.org/10.1186/s12859-016-1130-6
61 schema:sdDatePublished 2019-04-11T12:21
62 schema:sdLicense https://scigraph.springernature.com/explorer/license/
63 schema:sdPublisher N97ada120dc5c451cbca9405bc33cc319
64 schema:url https://link.springer.com/10.1186%2Fs12859-016-1130-6
65 sgo:license sg:explorer/license/
66 sgo:sdDataset articles
67 rdf:type schema:ScholarlyArticle
68 N2987c783dd60456e9bddf8df7ba987f8 rdf:first sg:person.01031255322.32
69 rdf:rest Nd1a399ef0e8d4cae9b191b5f2355de51
70 N37a0b543e166426bb7fd005030ef90cd schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
71 schema:name Sequence Analysis, Protein
72 rdf:type schema:DefinedTerm
73 N407eca5178294e58b773c35c04e4551b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
74 schema:name Computational Biology
75 rdf:type schema:DefinedTerm
76 N454bb4a2b14548fe92fa54295d309db8 schema:volumeNumber 17
77 rdf:type schema:PublicationVolume
78 N49fd038f20f448b687ed6d2c3bb300b9 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
79 schema:name Amino Acid Motifs
80 rdf:type schema:DefinedTerm
81 N4cf6433fa2224c4dbd1cbffa523a8734 rdf:first sg:person.010740564053.16
82 rdf:rest N56c360551cee49a0a4a66680025a90a7
83 N4da85364864143b099700f6f4cf4f915 rdf:first sg:person.011525772565.53
84 rdf:rest N4cf6433fa2224c4dbd1cbffa523a8734
85 N530e11039bb44350912205a8bcbae726 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
86 schema:name Software
87 rdf:type schema:DefinedTerm
88 N56c360551cee49a0a4a66680025a90a7 rdf:first sg:person.0613677314.28
89 rdf:rest N78f5ecac2f414c928665cd18acdcde15
90 N78f5ecac2f414c928665cd18acdcde15 rdf:first sg:person.016220372623.74
91 rdf:rest rdf:nil
92 N7f833a5d07bc4ebe9c0322a8e5be1e6f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
93 schema:name Algorithms
94 rdf:type schema:DefinedTerm
95 N97ada120dc5c451cbca9405bc33cc319 schema:name Springer Nature - SN SciGraph project
96 rdf:type schema:Organization
97 Na4a858d0cd6e4c1a8f075c4e2ed3609c schema:issueNumber Suppl 9
98 rdf:type schema:PublicationIssue
99 Nafa07d533b4d410ca0af935950905f41 schema:name pubmed_id
100 schema:value 27454113
101 rdf:type schema:PropertyValue
102 Nc9f68e55bf144cc8b467673caacf5cdb schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
103 schema:name Proteins
104 rdf:type schema:DefinedTerm
105 Nce14dcb3f130400fb236da627d90951a schema:name dimensions_id
106 schema:value pub.1026133554
107 rdf:type schema:PropertyValue
108 Nd1a399ef0e8d4cae9b191b5f2355de51 rdf:first sg:person.01077370522.90
109 rdf:rest N4da85364864143b099700f6f4cf4f915
110 Ndf311826fdf64f58b8346ad65593d772 schema:name doi
111 schema:value 10.1186/s12859-016-1130-6
112 rdf:type schema:PropertyValue
113 Ne36bbf6318a547cb8bea6058f7b7571f schema:name nlm_unique_id
114 schema:value 100965194
115 rdf:type schema:PropertyValue
116 Ne608597ba8194e2b84f3c164ae77bfba schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
117 schema:name Protein Domains
118 rdf:type schema:DefinedTerm
119 Nf6e41f6c6cdd4c18bc116acd362f16bf schema:name readcube_id
120 schema:value 7c7fee4cc86bc17570d2da299787d7c1da9ba645643e5eed31ac2315104dace9
121 rdf:type schema:PropertyValue
122 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
123 schema:name Information and Computing Sciences
124 rdf:type schema:DefinedTerm
125 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
126 schema:name Computation Theory and Mathematics
127 rdf:type schema:DefinedTerm
128 sg:grant.7017787 http://pending.schema.org/fundedItem sg:pub.10.1186/s12859-016-1130-6
129 rdf:type schema:MonetaryGrant
130 sg:journal.1023786 schema:issn 1471-2105
131 schema:name BMC Bioinformatics
132 rdf:type schema:Periodical
133 sg:person.01031255322.32 schema:affiliation https://www.grid.ac/institutes/grid.440736.2
134 schema:familyName Yu
135 schema:givenName Qiang
136 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01031255322.32
137 rdf:type schema:Person
138 sg:person.010740564053.16 schema:affiliation https://www.grid.ac/institutes/grid.440736.2
139 schema:familyName Feng
140 schema:givenName Dazheng
141 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010740564053.16
142 rdf:type schema:Person
143 sg:person.01077370522.90 schema:affiliation https://www.grid.ac/institutes/grid.440736.2
144 schema:familyName Huo
145 schema:givenName Hongwei
146 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01077370522.90
147 rdf:type schema:Person
148 sg:person.011525772565.53 schema:affiliation https://www.grid.ac/institutes/grid.440736.2
149 schema:familyName Zhao
150 schema:givenName Ruixing
151 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011525772565.53
152 rdf:type schema:Person
153 sg:person.016220372623.74 schema:affiliation https://www.grid.ac/institutes/grid.266515.3
154 schema:familyName Huan
155 schema:givenName Jun
156 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016220372623.74
157 rdf:type schema:Person
158 sg:person.0613677314.28 schema:affiliation https://www.grid.ac/institutes/grid.251313.7
159 schema:familyName Vitter
160 schema:givenName Jeffrey Scott
161 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0613677314.28
162 rdf:type schema:Person
163 sg:pub.10.1007/11758525_110 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005767417
164 https://doi.org/10.1007/11758525_110
165 rdf:type schema:CreativeWork
166 sg:pub.10.1038/nbt0806-959 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015976767
167 https://doi.org/10.1038/nbt0806-959
168 rdf:type schema:CreativeWork
169 sg:pub.10.1038/nbt1053 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030939237
170 https://doi.org/10.1038/nbt1053
171 rdf:type schema:CreativeWork
172 sg:pub.10.1038/nmeth0807-613 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051261645
173 https://doi.org/10.1038/nmeth0807-613
174 rdf:type schema:CreativeWork
175 sg:pub.10.1038/nprot.2006.98 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010562203
176 https://doi.org/10.1038/nprot.2006.98
177 rdf:type schema:CreativeWork
178 sg:pub.10.1038/srep07813 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025174963
179 https://doi.org/10.1038/srep07813
180 rdf:type schema:CreativeWork
181 sg:pub.10.1186/1471-2105-11-s11-s8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015484700
182 https://doi.org/10.1186/1471-2105-11-s11-s8
183 rdf:type schema:CreativeWork
184 sg:pub.10.1186/1471-2105-12-410 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004305251
185 https://doi.org/10.1186/1471-2105-12-410
186 rdf:type schema:CreativeWork
187 sg:pub.10.1186/1471-2105-15-34 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051407096
188 https://doi.org/10.1186/1471-2105-15-34
189 rdf:type schema:CreativeWork
190 sg:pub.10.1186/1471-2105-6-277 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039086700
191 https://doi.org/10.1186/1471-2105-6-277
192 rdf:type schema:CreativeWork
193 sg:pub.10.1186/1471-2105-7-488 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007199178
194 https://doi.org/10.1186/1471-2105-7-488
195 rdf:type schema:CreativeWork
196 sg:pub.10.1186/1471-2105-8-385 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004490779
197 https://doi.org/10.1186/1471-2105-8-385
198 rdf:type schema:CreativeWork
199 sg:pub.10.1186/1471-2105-8-s7-s21 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020592215
200 https://doi.org/10.1186/1471-2105-8-s7-s21
201 rdf:type schema:CreativeWork
202 sg:pub.10.1186/1748-7188-4-14 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012312142
203 https://doi.org/10.1186/1748-7188-4-14
204 rdf:type schema:CreativeWork
205 https://app.dimensions.ai/details/publication/pub.1074695300 schema:CreativeWork
206 https://app.dimensions.ai/details/publication/pub.1077006673 schema:CreativeWork
207 https://app.dimensions.ai/details/publication/pub.1082424111 schema:CreativeWork
208 https://doi.org/10.1016/s0304-3975(03)00320-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042161086
209 rdf:type schema:CreativeWork
210 https://doi.org/10.1093/bib/bbs016 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022400077
211 rdf:type schema:CreativeWork
212 https://doi.org/10.1093/bioinformatics/btr189 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008669948
213 rdf:type schema:CreativeWork
214 https://doi.org/10.1093/bioinformatics/btu093 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002175031
215 rdf:type schema:CreativeWork
216 https://doi.org/10.1093/nar/gkl198 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013590568
217 rdf:type schema:CreativeWork
218 https://doi.org/10.1109/99.660313 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061251961
219 rdf:type schema:CreativeWork
220 https://doi.org/10.1109/bibm.2015.7359745 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095677901
221 rdf:type schema:CreativeWork
222 https://doi.org/10.1109/isss.2002.1227161 schema:sameAs https://app.dimensions.ai/details/publication/pub.1098825860
223 rdf:type schema:CreativeWork
224 https://doi.org/10.1109/tcbb.2007.70241 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061540584
225 rdf:type schema:CreativeWork
226 https://doi.org/10.1109/tcbb.2014.2306842 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061541242
227 rdf:type schema:CreativeWork
228 https://doi.org/10.1109/tcbb.2014.2361668 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061541344
229 rdf:type schema:CreativeWork
230 https://doi.org/10.1126/science.8211139 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062653653
231 rdf:type schema:CreativeWork
232 https://doi.org/10.1145/369133.369172 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018080948
233 rdf:type schema:CreativeWork
234 https://doi.org/10.1371/journal.pone.0041425 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026111129
235 rdf:type schema:CreativeWork
236 https://doi.org/10.1371/journal.pone.0048442 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012261360
237 rdf:type schema:CreativeWork
238 https://www.grid.ac/institutes/grid.251313.7 schema:alternateName University of Mississippi
239 schema:name Department of Computer and Information Science, The University of Mississippi, 38677-1848, Oxford, MS, USA
240 rdf:type schema:Organization
241 https://www.grid.ac/institutes/grid.266515.3 schema:alternateName University of Kansas
242 schema:name Department of Electrical Engineering and Computer Science, the University of Kansas, 66045, Lawrence, KS, USA
243 rdf:type schema:Organization
244 https://www.grid.ac/institutes/grid.440736.2 schema:alternateName Xidian University
245 schema:name School of Computer Science and Technology, Xidian University, 710071, Xi’an, China
246 School of Electronic Engineering, Xidian University, 710071, Xi’an, China
247 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...