MUSCLE: a multiple sequence alignment method with reduced time and space complexity View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2004-08-19

AUTHORS

Robert C Edgar

ABSTRACT

BackgroundIn a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles.ResultsWe compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer.ConclusionsMUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at http://www.drive5.com/muscle. More... »

PAGES

113

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-5-113

DOI

http://dx.doi.org/10.1186/1471-2105-5-113

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1040413794

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/15318951


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Cluster Analysis", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Phylogeny", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software Design", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Time Factors", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, 94720-3102, Berkeley, CA, USA", 
          "id": "http://www.grid.ac/institutes/grid.47840.3f", 
          "name": [
            "Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, 94720-3102, Berkeley, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Edgar", 
        "givenName": "Robert C", 
        "id": "sg:person.01025250401.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01025250401.02"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/bf02257378", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014365591", 
          "https://doi.org/10.1007/bf02257378"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf02603120", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022962956", 
          "https://doi.org/10.1007/bf02603120"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-44696-6_2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045928456", 
          "https://doi.org/10.1007/3-540-44696-6_2"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2004-08-19", 
    "datePublishedReg": "2004-08-19", 
    "description": "BackgroundIn a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles.ResultsWe compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer.ConclusionsMUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at http://www.drive5.com/muscle.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1471-2105-5-113", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.5906856", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "5"
      }
    ], 
    "keywords": [
      "current desktop computers", 
      "accuracy of muscle", 
      "sequence alignment methods", 
      "multiple sequence alignment methods", 
      "alignment accuracy", 
      "computational complexity", 
      "desktop computer", 
      "space complexity", 
      "fast algorithm", 
      "accuracy benchmark", 
      "biological accuracy", 
      "alignment method", 
      "average alignment accuracy", 
      "test set", 
      "algorithm", 
      "high-throughput applications", 
      "high accuracy", 
      "objective function", 
      "new protocol", 
      "available programs", 
      "multiple alignment", 
      "accuracy", 
      "benchmarks", 
      "ClustalW", 
      "complexity", 
      "reduced time", 
      "high speed", 
      "SABmark", 
      "BAliBASE", 
      "computer", 
      "prefabs", 
      "Smart", 
      "protein sequences", 
      "speed", 
      "new program", 
      "range of options", 
      "set", 
      "protocol", 
      "applications", 
      "previous paper", 
      "program", 
      "complete discussion", 
      "technique", 
      "alignment", 
      "time", 
      "orders of magnitude", 
      "seconds", 
      "sequence", 
      "unpublished technique", 
      "order", 
      "method", 
      "brief summary", 
      "compromise", 
      "new options", 
      "variants", 
      "authors", 
      "POA", 
      "options", 
      "discussion", 
      "function", 
      "summary", 
      "magnitude", 
      "higher scores", 
      "date", 
      "range", 
      "scores", 
      "profile", 
      "ResultsWe", 
      "BackgroundIn", 
      "muscle", 
      "paper"
    ], 
    "name": "MUSCLE: a multiple sequence alignment method with reduced time and space complexity", 
    "pagination": "113", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1040413794"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-5-113"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "15318951"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-5-113", 
      "https://app.dimensions.ai/details/publication/pub.1040413794"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-05-20T07:22", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/article/article_384.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1471-2105-5-113"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-5-113'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-5-113'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-5-113'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-5-113'


 

This table displays all metadata directly associated to this object as RDF triples.

178 TRIPLES      22 PREDICATES      108 URIs      96 LITERALS      14 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-5-113 schema:about N1507301cff874d12abab72a055305625
2 Na099540e8c7849be92579558371ceab7
3 Ncad18731197b40eaa09fc819ad75f0f3
4 Nd1465d40683c4a64b7fd87654c575d50
5 Nd6123c989c2747e59edf5990a4a63f3f
6 Ndccf337df5924eee968f2455f261694e
7 Nfafe16b273b04d73b7b0c4370324791f
8 anzsrc-for:01
9 anzsrc-for:06
10 anzsrc-for:08
11 schema:author N2f2515bd87574bbab1862e84575bd7b6
12 schema:citation sg:pub.10.1007/3-540-44696-6_2
13 sg:pub.10.1007/bf02257378
14 sg:pub.10.1007/bf02603120
15 schema:datePublished 2004-08-19
16 schema:datePublishedReg 2004-08-19
17 schema:description BackgroundIn a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles.ResultsWe compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer.ConclusionsMUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at http://www.drive5.com/muscle.
18 schema:genre article
19 schema:inLanguage en
20 schema:isAccessibleForFree true
21 schema:isPartOf N6f5f6ed860ce436a8c2dd526f2d0edcd
22 N803988ebc8124e52b1c5863558428a89
23 sg:journal.1023786
24 schema:keywords BAliBASE
25 BackgroundIn
26 ClustalW
27 POA
28 ResultsWe
29 SABmark
30 Smart
31 accuracy
32 accuracy benchmark
33 accuracy of muscle
34 algorithm
35 alignment
36 alignment accuracy
37 alignment method
38 applications
39 authors
40 available programs
41 average alignment accuracy
42 benchmarks
43 biological accuracy
44 brief summary
45 complete discussion
46 complexity
47 compromise
48 computational complexity
49 computer
50 current desktop computers
51 date
52 desktop computer
53 discussion
54 fast algorithm
55 function
56 high accuracy
57 high speed
58 high-throughput applications
59 higher scores
60 magnitude
61 method
62 multiple alignment
63 multiple sequence alignment methods
64 muscle
65 new options
66 new program
67 new protocol
68 objective function
69 options
70 order
71 orders of magnitude
72 paper
73 prefabs
74 previous paper
75 profile
76 program
77 protein sequences
78 protocol
79 range
80 range of options
81 reduced time
82 scores
83 seconds
84 sequence
85 sequence alignment methods
86 set
87 space complexity
88 speed
89 summary
90 technique
91 test set
92 time
93 unpublished technique
94 variants
95 schema:name MUSCLE: a multiple sequence alignment method with reduced time and space complexity
96 schema:pagination 113
97 schema:productId N024cda28f3c54aceb713dfcac69f6894
98 N2d8df52fa5a84b3b8fc0aa71390d3a64
99 Ndb1cc955faf54e84b7076af03c56b070
100 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040413794
101 https://doi.org/10.1186/1471-2105-5-113
102 schema:sdDatePublished 2022-05-20T07:22
103 schema:sdLicense https://scigraph.springernature.com/explorer/license/
104 schema:sdPublisher Nd5c56fbe1dc24dd49ddf740938068b7e
105 schema:url https://doi.org/10.1186/1471-2105-5-113
106 sgo:license sg:explorer/license/
107 sgo:sdDataset articles
108 rdf:type schema:ScholarlyArticle
109 N024cda28f3c54aceb713dfcac69f6894 schema:name pubmed_id
110 schema:value 15318951
111 rdf:type schema:PropertyValue
112 N1507301cff874d12abab72a055305625 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
113 schema:name Software
114 rdf:type schema:DefinedTerm
115 N2d8df52fa5a84b3b8fc0aa71390d3a64 schema:name dimensions_id
116 schema:value pub.1040413794
117 rdf:type schema:PropertyValue
118 N2f2515bd87574bbab1862e84575bd7b6 rdf:first sg:person.01025250401.02
119 rdf:rest rdf:nil
120 N6f5f6ed860ce436a8c2dd526f2d0edcd schema:issueNumber 1
121 rdf:type schema:PublicationIssue
122 N803988ebc8124e52b1c5863558428a89 schema:volumeNumber 5
123 rdf:type schema:PublicationVolume
124 Na099540e8c7849be92579558371ceab7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
125 schema:name Cluster Analysis
126 rdf:type schema:DefinedTerm
127 Ncad18731197b40eaa09fc819ad75f0f3 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
128 schema:name Phylogeny
129 rdf:type schema:DefinedTerm
130 Nd1465d40683c4a64b7fd87654c575d50 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
131 schema:name Software Design
132 rdf:type schema:DefinedTerm
133 Nd5c56fbe1dc24dd49ddf740938068b7e schema:name Springer Nature - SN SciGraph project
134 rdf:type schema:Organization
135 Nd6123c989c2747e59edf5990a4a63f3f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
136 schema:name Computational Biology
137 rdf:type schema:DefinedTerm
138 Ndb1cc955faf54e84b7076af03c56b070 schema:name doi
139 schema:value 10.1186/1471-2105-5-113
140 rdf:type schema:PropertyValue
141 Ndccf337df5924eee968f2455f261694e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
142 schema:name Time Factors
143 rdf:type schema:DefinedTerm
144 Nfafe16b273b04d73b7b0c4370324791f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
145 schema:name Sequence Alignment
146 rdf:type schema:DefinedTerm
147 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
148 schema:name Mathematical Sciences
149 rdf:type schema:DefinedTerm
150 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
151 schema:name Biological Sciences
152 rdf:type schema:DefinedTerm
153 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
154 schema:name Information and Computing Sciences
155 rdf:type schema:DefinedTerm
156 sg:grant.5906856 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-5-113
157 rdf:type schema:MonetaryGrant
158 sg:journal.1023786 schema:issn 1471-2105
159 schema:name BMC Bioinformatics
160 schema:publisher Springer Nature
161 rdf:type schema:Periodical
162 sg:person.01025250401.02 schema:affiliation grid-institutes:grid.47840.3f
163 schema:familyName Edgar
164 schema:givenName Robert C
165 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01025250401.02
166 rdf:type schema:Person
167 sg:pub.10.1007/3-540-44696-6_2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045928456
168 https://doi.org/10.1007/3-540-44696-6_2
169 rdf:type schema:CreativeWork
170 sg:pub.10.1007/bf02257378 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014365591
171 https://doi.org/10.1007/bf02257378
172 rdf:type schema:CreativeWork
173 sg:pub.10.1007/bf02603120 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022962956
174 https://doi.org/10.1007/bf02603120
175 rdf:type schema:CreativeWork
176 grid-institutes:grid.47840.3f schema:alternateName Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, 94720-3102, Berkeley, CA, USA
177 schema:name Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, 94720-3102, Berkeley, CA, USA
178 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...