Kraken: ultrafast metagenomic sequence classification using exact alignments View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2014-01-01

AUTHORS

Derrick E Wood, Steven L Salzberg

ABSTRACT

Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/. More... »

PAGES

r46

Journal

TITLE

Genome Biology

ISSUE

3

VOLUME

15

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46

DOI

http://dx.doi.org/10.1186/gb-2014-15-3-r46

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1030203790

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/24580807


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Archaea", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Bacteria", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Classification", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sensitivity and Specificity", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA", 
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wood", 
        "givenName": "Derrick E", 
        "id": "sg:person.01223030670.09", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223030670.09"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA", 
            "Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Salzberg", 
        "givenName": "Steven L", 
        "id": "sg:person.01223441713.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1186/1471-2105-12-385", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037780208", 
          "https://doi.org/10.1186/1471-2105-12-385"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth0511-367", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022956952", 
          "https://doi.org/10.1038/nmeth0511-367"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2013-14-1-r2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013525643", 
          "https://doi.org/10.1186/gb-2013-14-1-r2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth1043", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047202519", 
          "https://doi.org/10.1038/nmeth1043"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature02340", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023089166", 
          "https://doi.org/10.1038/nature02340"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-10-421", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050579230", 
          "https://doi.org/10.1186/1471-2105-10-421"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature11234", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007740093", 
          "https://doi.org/10.1038/nature11234"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.2066", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010611135", 
          "https://doi.org/10.1038/nmeth.2066"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.1358", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008886215", 
          "https://doi.org/10.1038/nmeth.1358"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-12-s2-s4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021732909", 
          "https://doi.org/10.1186/1471-2164-12-s2-s4"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2014-01-01", 
    "datePublishedReg": "2014-01-01", 
    "description": "Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/gb-2014-15-3-r46", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2519905", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2529453", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023439", 
        "issn": [
          "1474-760X", 
          "1465-6906"
        ], 
        "name": "Genome Biology", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "15"
      }
    ], 
    "keywords": [
      "exact alignment", 
      "metagenomic sequence classification", 
      "sequence classification", 
      "classification accuracy", 
      "metagenomic DNA sequences", 
      "taxonomic labels", 
      "k-mers", 
      "accurate program", 
      "Kraken", 
      "BLAST program", 
      "estimation program", 
      "small subset", 
      "metagenomic data", 
      "task", 
      "classification", 
      "previous programs", 
      "MegaBLAST", 
      "labels", 
      "accuracy", 
      "alignment", 
      "MetaPhlAn", 
      "researchers", 
      "program", 
      "time", 
      "reads", 
      "data", 
      "subset", 
      "fast mode", 
      "sequence", 
      "pairs", 
      "mode", 
      "DNA sequences", 
      "rate", 
      "minutes", 
      "base pairs"
    ], 
    "name": "Kraken: ultrafast metagenomic sequence classification using exact alignments", 
    "pagination": "r46", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1030203790"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/gb-2014-15-3-r46"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "24580807"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/gb-2014-15-3-r46", 
      "https://app.dimensions.ai/details/publication/pub.1030203790"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-12-01T06:31", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/article/article_624.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/gb-2014-15-3-r46"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46'


 

This table displays all metadata directly associated to this object as RDF triples.

190 TRIPLES      21 PREDICATES      80 URIs      62 LITERALS      17 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/gb-2014-15-3-r46 schema:about N1799176e08614b77b71136386edaf405
2 N39cb5789e6e64938a97a8335ddadec37
3 N60260c14cedf4fcbaaa300355ba59a41
4 N774f8acfbd3d4f8786857b10ccbce43f
5 N803d63e596304c1ebc87b42d134a9f74
6 N9713a6cec3c54c6c91af82780b9a2d82
7 Nb2ae83bb62e94380a7cb5cda3c66d93a
8 Ne4519ce4d8aa4918aa279a82d0331178
9 Ne8ed69593a0e407fa8de6f1cba99d857
10 Nea34044621884c318a534a41b7b20b9e
11 anzsrc-for:06
12 anzsrc-for:0604
13 schema:author N636d9ec361d44d5cb4195de82f5ee315
14 schema:citation sg:pub.10.1038/nature02340
15 sg:pub.10.1038/nature11234
16 sg:pub.10.1038/nmeth.1358
17 sg:pub.10.1038/nmeth.2066
18 sg:pub.10.1038/nmeth0511-367
19 sg:pub.10.1038/nmeth1043
20 sg:pub.10.1186/1471-2105-10-421
21 sg:pub.10.1186/1471-2105-12-385
22 sg:pub.10.1186/1471-2164-12-s2-s4
23 sg:pub.10.1186/gb-2013-14-1-r2
24 schema:datePublished 2014-01-01
25 schema:datePublishedReg 2014-01-01
26 schema:description Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/.
27 schema:genre article
28 schema:isAccessibleForFree true
29 schema:isPartOf N53472ed287fd4f3fa26fd278276fd97b
30 N7cc313120b36459aab91826b64eaa567
31 sg:journal.1023439
32 schema:keywords BLAST program
33 DNA sequences
34 Kraken
35 MegaBLAST
36 MetaPhlAn
37 accuracy
38 accurate program
39 alignment
40 base pairs
41 classification
42 classification accuracy
43 data
44 estimation program
45 exact alignment
46 fast mode
47 k-mers
48 labels
49 metagenomic DNA sequences
50 metagenomic data
51 metagenomic sequence classification
52 minutes
53 mode
54 pairs
55 previous programs
56 program
57 rate
58 reads
59 researchers
60 sequence
61 sequence classification
62 small subset
63 subset
64 task
65 taxonomic labels
66 time
67 schema:name Kraken: ultrafast metagenomic sequence classification using exact alignments
68 schema:pagination r46
69 schema:productId N80ed7e9f6971457c8bffeba1681f350a
70 Na0955ebe14454d60a08c6b6c143078a4
71 Ned68d5085ca24d699b4f956beb27b6c4
72 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030203790
73 https://doi.org/10.1186/gb-2014-15-3-r46
74 schema:sdDatePublished 2022-12-01T06:31
75 schema:sdLicense https://scigraph.springernature.com/explorer/license/
76 schema:sdPublisher Nc1b4c5b79b1f4548855e406230474a18
77 schema:url https://doi.org/10.1186/gb-2014-15-3-r46
78 sgo:license sg:explorer/license/
79 sgo:sdDataset articles
80 rdf:type schema:ScholarlyArticle
81 N1799176e08614b77b71136386edaf405 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
82 schema:name Classification
83 rdf:type schema:DefinedTerm
84 N1e6da01ddcaa409e8a75663c4ede5161 rdf:first sg:person.01223441713.02
85 rdf:rest rdf:nil
86 N39cb5789e6e64938a97a8335ddadec37 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
87 schema:name Software
88 rdf:type schema:DefinedTerm
89 N53472ed287fd4f3fa26fd278276fd97b schema:issueNumber 3
90 rdf:type schema:PublicationIssue
91 N60260c14cedf4fcbaaa300355ba59a41 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
92 schema:name Sequence Alignment
93 rdf:type schema:DefinedTerm
94 N636d9ec361d44d5cb4195de82f5ee315 rdf:first sg:person.01223030670.09
95 rdf:rest N1e6da01ddcaa409e8a75663c4ede5161
96 N774f8acfbd3d4f8786857b10ccbce43f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
97 schema:name Bacteria
98 rdf:type schema:DefinedTerm
99 N7cc313120b36459aab91826b64eaa567 schema:volumeNumber 15
100 rdf:type schema:PublicationVolume
101 N803d63e596304c1ebc87b42d134a9f74 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
102 schema:name Metagenome
103 rdf:type schema:DefinedTerm
104 N80ed7e9f6971457c8bffeba1681f350a schema:name dimensions_id
105 schema:value pub.1030203790
106 rdf:type schema:PropertyValue
107 N9713a6cec3c54c6c91af82780b9a2d82 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
108 schema:name Sensitivity and Specificity
109 rdf:type schema:DefinedTerm
110 Na0955ebe14454d60a08c6b6c143078a4 schema:name pubmed_id
111 schema:value 24580807
112 rdf:type schema:PropertyValue
113 Nb2ae83bb62e94380a7cb5cda3c66d93a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
114 schema:name Humans
115 rdf:type schema:DefinedTerm
116 Nc1b4c5b79b1f4548855e406230474a18 schema:name Springer Nature - SN SciGraph project
117 rdf:type schema:Organization
118 Ne4519ce4d8aa4918aa279a82d0331178 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
119 schema:name Archaea
120 rdf:type schema:DefinedTerm
121 Ne8ed69593a0e407fa8de6f1cba99d857 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
122 schema:name Metagenomics
123 rdf:type schema:DefinedTerm
124 Nea34044621884c318a534a41b7b20b9e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
125 schema:name Sequence Analysis, DNA
126 rdf:type schema:DefinedTerm
127 Ned68d5085ca24d699b4f956beb27b6c4 schema:name doi
128 schema:value 10.1186/gb-2014-15-3-r46
129 rdf:type schema:PropertyValue
130 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
131 schema:name Biological Sciences
132 rdf:type schema:DefinedTerm
133 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
134 schema:name Genetics
135 rdf:type schema:DefinedTerm
136 sg:grant.2519905 http://pending.schema.org/fundedItem sg:pub.10.1186/gb-2014-15-3-r46
137 rdf:type schema:MonetaryGrant
138 sg:grant.2529453 http://pending.schema.org/fundedItem sg:pub.10.1186/gb-2014-15-3-r46
139 rdf:type schema:MonetaryGrant
140 sg:journal.1023439 schema:issn 1465-6906
141 1474-760X
142 schema:name Genome Biology
143 schema:publisher Springer Nature
144 rdf:type schema:Periodical
145 sg:person.01223030670.09 schema:affiliation grid-institutes:grid.21107.35
146 schema:familyName Wood
147 schema:givenName Derrick E
148 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223030670.09
149 rdf:type schema:Person
150 sg:person.01223441713.02 schema:affiliation grid-institutes:grid.21107.35
151 schema:familyName Salzberg
152 schema:givenName Steven L
153 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02
154 rdf:type schema:Person
155 sg:pub.10.1038/nature02340 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023089166
156 https://doi.org/10.1038/nature02340
157 rdf:type schema:CreativeWork
158 sg:pub.10.1038/nature11234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007740093
159 https://doi.org/10.1038/nature11234
160 rdf:type schema:CreativeWork
161 sg:pub.10.1038/nmeth.1358 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008886215
162 https://doi.org/10.1038/nmeth.1358
163 rdf:type schema:CreativeWork
164 sg:pub.10.1038/nmeth.2066 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010611135
165 https://doi.org/10.1038/nmeth.2066
166 rdf:type schema:CreativeWork
167 sg:pub.10.1038/nmeth0511-367 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022956952
168 https://doi.org/10.1038/nmeth0511-367
169 rdf:type schema:CreativeWork
170 sg:pub.10.1038/nmeth1043 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047202519
171 https://doi.org/10.1038/nmeth1043
172 rdf:type schema:CreativeWork
173 sg:pub.10.1186/1471-2105-10-421 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050579230
174 https://doi.org/10.1186/1471-2105-10-421
175 rdf:type schema:CreativeWork
176 sg:pub.10.1186/1471-2105-12-385 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037780208
177 https://doi.org/10.1186/1471-2105-12-385
178 rdf:type schema:CreativeWork
179 sg:pub.10.1186/1471-2164-12-s2-s4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021732909
180 https://doi.org/10.1186/1471-2164-12-s2-s4
181 rdf:type schema:CreativeWork
182 sg:pub.10.1186/gb-2013-14-1-r2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013525643
183 https://doi.org/10.1186/gb-2013-14-1-r2
184 rdf:type schema:CreativeWork
185 grid-institutes:grid.21107.35 schema:alternateName Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
186 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
187 schema:name Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
188 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
189 Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
190 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...