Kraken: ultrafast metagenomic sequence classification using exact alignments View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2014-03-03

AUTHORS

Derrick E Wood, Steven L Salzberg

ABSTRACT

Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/. More... »

PAGES

r46

Journal

TITLE

Genome Biology

ISSUE

3

VOLUME

15

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46

DOI

http://dx.doi.org/10.1186/gb-2014-15-3-r46

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1030203790

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/24580807


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Archaea", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Bacteria", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Classification", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sensitivity and Specificity", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA", 
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wood", 
        "givenName": "Derrick E", 
        "id": "sg:person.01223030670.09", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223030670.09"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA", 
            "Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Salzberg", 
        "givenName": "Steven L", 
        "id": "sg:person.01223441713.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nmeth1043", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047202519", 
          "https://doi.org/10.1038/nmeth1043"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-12-s2-s4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021732909", 
          "https://doi.org/10.1186/1471-2164-12-s2-s4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature02340", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023089166", 
          "https://doi.org/10.1038/nature02340"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2013-14-1-r2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013525643", 
          "https://doi.org/10.1186/gb-2013-14-1-r2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-10-421", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050579230", 
          "https://doi.org/10.1186/1471-2105-10-421"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth0511-367", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022956952", 
          "https://doi.org/10.1038/nmeth0511-367"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature11234", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007740093", 
          "https://doi.org/10.1038/nature11234"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.2066", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010611135", 
          "https://doi.org/10.1038/nmeth.2066"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.1358", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008886215", 
          "https://doi.org/10.1038/nmeth.1358"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-12-385", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037780208", 
          "https://doi.org/10.1186/1471-2105-12-385"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2014-03-03", 
    "datePublishedReg": "2014-03-03", 
    "description": "Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/gb-2014-15-3-r46", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2519905", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2529453", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023439", 
        "issn": [
          "1474-760X", 
          "1465-6906"
        ], 
        "name": "Genome Biology", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "15"
      }
    ], 
    "keywords": [
      "exact alignment", 
      "metagenomic sequence classification", 
      "sequence classification", 
      "classification accuracy", 
      "metagenomic DNA sequences", 
      "taxonomic labels", 
      "k-mers", 
      "accurate program", 
      "Kraken", 
      "BLAST program", 
      "estimation program", 
      "small subset", 
      "metagenomic data", 
      "task", 
      "classification", 
      "previous programs", 
      "MegaBLAST", 
      "labels", 
      "accuracy", 
      "alignment", 
      "MetaPhlAn", 
      "researchers", 
      "program", 
      "time", 
      "reads", 
      "data", 
      "subset", 
      "fast mode", 
      "sequence", 
      "pairs", 
      "mode", 
      "DNA sequences", 
      "rate", 
      "minutes", 
      "base pairs"
    ], 
    "name": "Kraken: ultrafast metagenomic sequence classification using exact alignments", 
    "pagination": "r46", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1030203790"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/gb-2014-15-3-r46"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "24580807"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/gb-2014-15-3-r46", 
      "https://app.dimensions.ai/details/publication/pub.1030203790"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-09-02T15:58", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220902/entities/gbq_results/article/article_632.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/gb-2014-15-3-r46"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/gb-2014-15-3-r46'


 

This table displays all metadata directly associated to this object as RDF triples.

190 TRIPLES      21 PREDICATES      80 URIs      62 LITERALS      17 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/gb-2014-15-3-r46 schema:about N33d42c46a6884eeebd8becddb673acbb
2 N5d1de6840079448ba881490a41d13912
3 N66e2c84ad1e54ce9b08315fc5789e816
4 N7f63e86e07de49c4837fd27dc5c3d858
5 N8d32c658e25942f1b966807007fc4d20
6 N93f119a643c24f02b4645d3c923fc656
7 Nbabf6bcfc18949a981d9d76539dd624b
8 Nd8d57ec9bb804971bafab08cf4f87848
9 Ne10ef0cd2f0248e495110bfe57761975
10 Nfb4e043e9d2a49c99aec076a9da16170
11 anzsrc-for:06
12 anzsrc-for:0604
13 schema:author N89faf6061bec4a1ebfa2c2fd8179de6b
14 schema:citation sg:pub.10.1038/nature02340
15 sg:pub.10.1038/nature11234
16 sg:pub.10.1038/nmeth.1358
17 sg:pub.10.1038/nmeth.2066
18 sg:pub.10.1038/nmeth0511-367
19 sg:pub.10.1038/nmeth1043
20 sg:pub.10.1186/1471-2105-10-421
21 sg:pub.10.1186/1471-2105-12-385
22 sg:pub.10.1186/1471-2164-12-s2-s4
23 sg:pub.10.1186/gb-2013-14-1-r2
24 schema:datePublished 2014-03-03
25 schema:datePublishedReg 2014-03-03
26 schema:description Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/.
27 schema:genre article
28 schema:isAccessibleForFree true
29 schema:isPartOf Nc82041d212e14cae930f1a619d4c01d2
30 Ndd10f77896524f2196469ee905ca7271
31 sg:journal.1023439
32 schema:keywords BLAST program
33 DNA sequences
34 Kraken
35 MegaBLAST
36 MetaPhlAn
37 accuracy
38 accurate program
39 alignment
40 base pairs
41 classification
42 classification accuracy
43 data
44 estimation program
45 exact alignment
46 fast mode
47 k-mers
48 labels
49 metagenomic DNA sequences
50 metagenomic data
51 metagenomic sequence classification
52 minutes
53 mode
54 pairs
55 previous programs
56 program
57 rate
58 reads
59 researchers
60 sequence
61 sequence classification
62 small subset
63 subset
64 task
65 taxonomic labels
66 time
67 schema:name Kraken: ultrafast metagenomic sequence classification using exact alignments
68 schema:pagination r46
69 schema:productId N4bc6f3133f874d6e94377f2d7806e577
70 N9afd60bd1b614ea59c8fc60d4ef98ffc
71 Nac75cf24976648a092a9b18a1de07f92
72 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030203790
73 https://doi.org/10.1186/gb-2014-15-3-r46
74 schema:sdDatePublished 2022-09-02T15:58
75 schema:sdLicense https://scigraph.springernature.com/explorer/license/
76 schema:sdPublisher Nba3b96db186f4130985d725ff3b4e560
77 schema:url https://doi.org/10.1186/gb-2014-15-3-r46
78 sgo:license sg:explorer/license/
79 sgo:sdDataset articles
80 rdf:type schema:ScholarlyArticle
81 N0ed31d5c263a42bd93fbc3fb5af11eff rdf:first sg:person.01223441713.02
82 rdf:rest rdf:nil
83 N33d42c46a6884eeebd8becddb673acbb schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
84 schema:name Metagenome
85 rdf:type schema:DefinedTerm
86 N4bc6f3133f874d6e94377f2d7806e577 schema:name dimensions_id
87 schema:value pub.1030203790
88 rdf:type schema:PropertyValue
89 N5d1de6840079448ba881490a41d13912 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
90 schema:name Sequence Analysis, DNA
91 rdf:type schema:DefinedTerm
92 N66e2c84ad1e54ce9b08315fc5789e816 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
93 schema:name Archaea
94 rdf:type schema:DefinedTerm
95 N7f63e86e07de49c4837fd27dc5c3d858 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
96 schema:name Bacteria
97 rdf:type schema:DefinedTerm
98 N89faf6061bec4a1ebfa2c2fd8179de6b rdf:first sg:person.01223030670.09
99 rdf:rest N0ed31d5c263a42bd93fbc3fb5af11eff
100 N8d32c658e25942f1b966807007fc4d20 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
101 schema:name Sensitivity and Specificity
102 rdf:type schema:DefinedTerm
103 N93f119a643c24f02b4645d3c923fc656 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
104 schema:name Humans
105 rdf:type schema:DefinedTerm
106 N9afd60bd1b614ea59c8fc60d4ef98ffc schema:name doi
107 schema:value 10.1186/gb-2014-15-3-r46
108 rdf:type schema:PropertyValue
109 Nac75cf24976648a092a9b18a1de07f92 schema:name pubmed_id
110 schema:value 24580807
111 rdf:type schema:PropertyValue
112 Nba3b96db186f4130985d725ff3b4e560 schema:name Springer Nature - SN SciGraph project
113 rdf:type schema:Organization
114 Nbabf6bcfc18949a981d9d76539dd624b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
115 schema:name Software
116 rdf:type schema:DefinedTerm
117 Nc82041d212e14cae930f1a619d4c01d2 schema:issueNumber 3
118 rdf:type schema:PublicationIssue
119 Nd8d57ec9bb804971bafab08cf4f87848 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
120 schema:name Sequence Alignment
121 rdf:type schema:DefinedTerm
122 Ndd10f77896524f2196469ee905ca7271 schema:volumeNumber 15
123 rdf:type schema:PublicationVolume
124 Ne10ef0cd2f0248e495110bfe57761975 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
125 schema:name Classification
126 rdf:type schema:DefinedTerm
127 Nfb4e043e9d2a49c99aec076a9da16170 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
128 schema:name Metagenomics
129 rdf:type schema:DefinedTerm
130 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
131 schema:name Biological Sciences
132 rdf:type schema:DefinedTerm
133 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
134 schema:name Genetics
135 rdf:type schema:DefinedTerm
136 sg:grant.2519905 http://pending.schema.org/fundedItem sg:pub.10.1186/gb-2014-15-3-r46
137 rdf:type schema:MonetaryGrant
138 sg:grant.2529453 http://pending.schema.org/fundedItem sg:pub.10.1186/gb-2014-15-3-r46
139 rdf:type schema:MonetaryGrant
140 sg:journal.1023439 schema:issn 1465-6906
141 1474-760X
142 schema:name Genome Biology
143 schema:publisher Springer Nature
144 rdf:type schema:Periodical
145 sg:person.01223030670.09 schema:affiliation grid-institutes:grid.21107.35
146 schema:familyName Wood
147 schema:givenName Derrick E
148 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223030670.09
149 rdf:type schema:Person
150 sg:person.01223441713.02 schema:affiliation grid-institutes:grid.21107.35
151 schema:familyName Salzberg
152 schema:givenName Steven L
153 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02
154 rdf:type schema:Person
155 sg:pub.10.1038/nature02340 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023089166
156 https://doi.org/10.1038/nature02340
157 rdf:type schema:CreativeWork
158 sg:pub.10.1038/nature11234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007740093
159 https://doi.org/10.1038/nature11234
160 rdf:type schema:CreativeWork
161 sg:pub.10.1038/nmeth.1358 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008886215
162 https://doi.org/10.1038/nmeth.1358
163 rdf:type schema:CreativeWork
164 sg:pub.10.1038/nmeth.2066 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010611135
165 https://doi.org/10.1038/nmeth.2066
166 rdf:type schema:CreativeWork
167 sg:pub.10.1038/nmeth0511-367 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022956952
168 https://doi.org/10.1038/nmeth0511-367
169 rdf:type schema:CreativeWork
170 sg:pub.10.1038/nmeth1043 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047202519
171 https://doi.org/10.1038/nmeth1043
172 rdf:type schema:CreativeWork
173 sg:pub.10.1186/1471-2105-10-421 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050579230
174 https://doi.org/10.1186/1471-2105-10-421
175 rdf:type schema:CreativeWork
176 sg:pub.10.1186/1471-2105-12-385 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037780208
177 https://doi.org/10.1186/1471-2105-12-385
178 rdf:type schema:CreativeWork
179 sg:pub.10.1186/1471-2164-12-s2-s4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021732909
180 https://doi.org/10.1186/1471-2164-12-s2-s4
181 rdf:type schema:CreativeWork
182 sg:pub.10.1186/gb-2013-14-1-r2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013525643
183 https://doi.org/10.1186/gb-2013-14-1-r2
184 rdf:type schema:CreativeWork
185 grid-institutes:grid.21107.35 schema:alternateName Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
186 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
187 schema:name Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
188 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
189 Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
190 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...