BLAST+: architecture and applications View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2009-12-15

AUTHORS

Christiam Camacho, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, Thomas L Madden

ABSTRACT

BackgroundSequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications.ResultsWe describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site.ConclusionThe new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications. More... »

PAGES

421

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-10-421

DOI

http://dx.doi.org/10.1186/1471-2105-10-421

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1050579230

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/20003500


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Genetic", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.419234.9", 
          "name": [
            "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Camacho", 
        "givenName": "Christiam", 
        "id": "sg:person.0770117714.83", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770117714.83"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.419234.9", 
          "name": [
            "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Coulouris", 
        "givenName": "George", 
        "id": "sg:person.0776642273.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0776642273.17"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.419234.9", 
          "name": [
            "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Avagyan", 
        "givenName": "Vahram", 
        "id": "sg:person.0722540102.57", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0722540102.57"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.419234.9", 
          "name": [
            "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ma", 
        "givenName": "Ning", 
        "id": "sg:person.01220574714.68", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01220574714.68"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.419234.9", 
          "name": [
            "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Papadopoulos", 
        "givenName": "Jason", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.419234.9", 
          "name": [
            "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bealer", 
        "givenName": "Kevin", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.419234.9", 
          "name": [
            "National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Madden", 
        "givenName": "Thomas L", 
        "id": "sg:person.01155045411.88", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01155045411.88"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nature01262", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039854529", 
          "https://doi.org/10.1038/nature01262"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2009-12-15", 
    "datePublishedReg": "2009-12-15", 
    "description": "BackgroundSequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications.ResultsWe describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site.ConclusionThe new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1471-2105-10-421", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "10"
      }
    ], 
    "keywords": [
      "command-line application", 
      "long queries", 
      "modular software library", 
      "Basic Local Alignment Search Tool", 
      "arbitrary data sources", 
      "important bioinformatics tasks", 
      "long query sequences", 
      "database sequences", 
      "substantial speed improvements", 
      "short queries", 
      "software library", 
      "user interface", 
      "memory usage", 
      "bioinformatics tasks", 
      "use of heuristics", 
      "query sequence", 
      "queries", 
      "Web sites", 
      "Local Alignment Search Tool", 
      "search tools", 
      "speed improvement", 
      "data sources", 
      "new features", 
      "BLAST software", 
      "CPU time", 
      "relevant parts", 
      "exact method", 
      "software", 
      "files", 
      "BLAST database", 
      "applications", 
      "database", 
      "users", 
      "BLAST tool", 
      "heuristics", 
      "architecture", 
      "chunks", 
      "tool", 
      "task", 
      "features", 
      "usage", 
      "processing", 
      "information", 
      "library", 
      "set", 
      "search", 
      "interface", 
      "sequence data", 
      "shortcomings", 
      "sequence", 
      "improvement", 
      "speed", 
      "time", 
      "similarity", 
      "data", 
      "method", 
      "program", 
      "use", 
      "part", 
      "source", 
      "cases", 
      "options", 
      "contigs", 
      "ResultsWe", 
      "sites", 
      "chromosomes"
    ], 
    "name": "BLAST+: architecture and applications", 
    "pagination": "421", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1050579230"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-10-421"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "20003500"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-10-421", 
      "https://app.dimensions.ai/details/publication/pub.1050579230"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-05-20T07:25", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/article/article_494.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1471-2105-10-421"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-10-421'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-10-421'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-10-421'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-10-421'


 

This table displays all metadata directly associated to this object as RDF triples.

187 TRIPLES      22 PREDICATES      97 URIs      88 LITERALS      11 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-10-421 schema:about N570f30a955974c83b275141d11aa07bb
2 N86c57a702c074447ba52d3bf9545506d
3 Nb1262c4e3ce14fd19a659f179918b1fe
4 Nfa20b337ab884e68a42918cea5b9ba4d
5 anzsrc-for:08
6 anzsrc-for:0806
7 schema:author Nb616fdf394484f31a5241401577d2f68
8 schema:citation sg:pub.10.1038/nature01262
9 schema:datePublished 2009-12-15
10 schema:datePublishedReg 2009-12-15
11 schema:description BackgroundSequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications.ResultsWe describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site.ConclusionThe new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
12 schema:genre article
13 schema:inLanguage en
14 schema:isAccessibleForFree true
15 schema:isPartOf N075331b6c672438188dd83272c7b9c69
16 Nb78182be9d4f4edfa9e1076e0d33f09a
17 sg:journal.1023786
18 schema:keywords BLAST database
19 BLAST software
20 BLAST tool
21 Basic Local Alignment Search Tool
22 CPU time
23 Local Alignment Search Tool
24 ResultsWe
25 Web sites
26 applications
27 arbitrary data sources
28 architecture
29 bioinformatics tasks
30 cases
31 chromosomes
32 chunks
33 command-line application
34 contigs
35 data
36 data sources
37 database
38 database sequences
39 exact method
40 features
41 files
42 heuristics
43 important bioinformatics tasks
44 improvement
45 information
46 interface
47 library
48 long queries
49 long query sequences
50 memory usage
51 method
52 modular software library
53 new features
54 options
55 part
56 processing
57 program
58 queries
59 query sequence
60 relevant parts
61 search
62 search tools
63 sequence
64 sequence data
65 set
66 short queries
67 shortcomings
68 similarity
69 sites
70 software
71 software library
72 source
73 speed
74 speed improvement
75 substantial speed improvements
76 task
77 time
78 tool
79 usage
80 use
81 use of heuristics
82 user interface
83 users
84 schema:name BLAST+: architecture and applications
85 schema:pagination 421
86 schema:productId N3cafa25e2e624c3a996d63d5fc86102c
87 N78ce5a3ee19849e2be148e6ad49df744
88 Nbdaaf0b0fa94450997b4268976174164
89 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050579230
90 https://doi.org/10.1186/1471-2105-10-421
91 schema:sdDatePublished 2022-05-20T07:25
92 schema:sdLicense https://scigraph.springernature.com/explorer/license/
93 schema:sdPublisher N0db7eafdc6d34b80837d9f79b67ee0fa
94 schema:url https://doi.org/10.1186/1471-2105-10-421
95 sgo:license sg:explorer/license/
96 sgo:sdDataset articles
97 rdf:type schema:ScholarlyArticle
98 N075331b6c672438188dd83272c7b9c69 schema:volumeNumber 10
99 rdf:type schema:PublicationVolume
100 N0db7eafdc6d34b80837d9f79b67ee0fa schema:name Springer Nature - SN SciGraph project
101 rdf:type schema:Organization
102 N3cafa25e2e624c3a996d63d5fc86102c schema:name dimensions_id
103 schema:value pub.1050579230
104 rdf:type schema:PropertyValue
105 N4276ac2f91954771bd743c5a31595f3d rdf:first sg:person.01155045411.88
106 rdf:rest rdf:nil
107 N496985ed610f4a7383fb7eb7d692413f rdf:first sg:person.01220574714.68
108 rdf:rest N4b6281a1835f48ada66a434b0b799e77
109 N4b6281a1835f48ada66a434b0b799e77 rdf:first Nf3d7745eab734db289fb2295e8ac77f7
110 rdf:rest N4df0917be2484ca3935a1caf16f35486
111 N4df0917be2484ca3935a1caf16f35486 rdf:first Nfc4ba860059d4de2b0edd4005f583bf5
112 rdf:rest N4276ac2f91954771bd743c5a31595f3d
113 N4e76f657e1ba4ad882e10bd1ec8743cd rdf:first sg:person.0776642273.17
114 rdf:rest Nceeaa41bd05a4109be49f4ab5886886a
115 N570f30a955974c83b275141d11aa07bb schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
116 schema:name Databases, Genetic
117 rdf:type schema:DefinedTerm
118 N78ce5a3ee19849e2be148e6ad49df744 schema:name pubmed_id
119 schema:value 20003500
120 rdf:type schema:PropertyValue
121 N86c57a702c074447ba52d3bf9545506d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
122 schema:name Software
123 rdf:type schema:DefinedTerm
124 Nb1262c4e3ce14fd19a659f179918b1fe schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
125 schema:name Computational Biology
126 rdf:type schema:DefinedTerm
127 Nb616fdf394484f31a5241401577d2f68 rdf:first sg:person.0770117714.83
128 rdf:rest N4e76f657e1ba4ad882e10bd1ec8743cd
129 Nb78182be9d4f4edfa9e1076e0d33f09a schema:issueNumber 1
130 rdf:type schema:PublicationIssue
131 Nbdaaf0b0fa94450997b4268976174164 schema:name doi
132 schema:value 10.1186/1471-2105-10-421
133 rdf:type schema:PropertyValue
134 Nceeaa41bd05a4109be49f4ab5886886a rdf:first sg:person.0722540102.57
135 rdf:rest N496985ed610f4a7383fb7eb7d692413f
136 Nf3d7745eab734db289fb2295e8ac77f7 schema:affiliation grid-institutes:grid.419234.9
137 schema:familyName Papadopoulos
138 schema:givenName Jason
139 rdf:type schema:Person
140 Nfa20b337ab884e68a42918cea5b9ba4d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
141 schema:name Sequence Alignment
142 rdf:type schema:DefinedTerm
143 Nfc4ba860059d4de2b0edd4005f583bf5 schema:affiliation grid-institutes:grid.419234.9
144 schema:familyName Bealer
145 schema:givenName Kevin
146 rdf:type schema:Person
147 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
148 schema:name Information and Computing Sciences
149 rdf:type schema:DefinedTerm
150 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
151 schema:name Information Systems
152 rdf:type schema:DefinedTerm
153 sg:journal.1023786 schema:issn 1471-2105
154 schema:name BMC Bioinformatics
155 schema:publisher Springer Nature
156 rdf:type schema:Periodical
157 sg:person.01155045411.88 schema:affiliation grid-institutes:grid.419234.9
158 schema:familyName Madden
159 schema:givenName Thomas L
160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01155045411.88
161 rdf:type schema:Person
162 sg:person.01220574714.68 schema:affiliation grid-institutes:grid.419234.9
163 schema:familyName Ma
164 schema:givenName Ning
165 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01220574714.68
166 rdf:type schema:Person
167 sg:person.0722540102.57 schema:affiliation grid-institutes:grid.419234.9
168 schema:familyName Avagyan
169 schema:givenName Vahram
170 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0722540102.57
171 rdf:type schema:Person
172 sg:person.0770117714.83 schema:affiliation grid-institutes:grid.419234.9
173 schema:familyName Camacho
174 schema:givenName Christiam
175 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770117714.83
176 rdf:type schema:Person
177 sg:person.0776642273.17 schema:affiliation grid-institutes:grid.419234.9
178 schema:familyName Coulouris
179 schema:givenName George
180 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0776642273.17
181 rdf:type schema:Person
182 sg:pub.10.1038/nature01262 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039854529
183 https://doi.org/10.1038/nature01262
184 rdf:type schema:CreativeWork
185 grid-institutes:grid.419234.9 schema:alternateName National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA
186 schema:name National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, 20894, Bethesda, MD, USA
187 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...