KrakenUniq: confident and fast metagenomics classification using unique k-mer counts View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-11-16

AUTHORS

F. P. Breitwieser, D. N. Baker, S. L. Salzberg

ABSTRACT

False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq. More... »

PAGES

198

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0

DOI

http://dx.doi.org/10.1186/s13059-018-1568-0

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1109919786

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/30445993


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Infections", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Breitwieser", 
        "givenName": "F. P.", 
        "id": "sg:person.01105562620.94", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01105562620.94"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA", 
            "Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Baker", 
        "givenName": "D. N.", 
        "id": "sg:person.011552542733.31", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011552542733.31"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA", 
            "Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA", 
            "Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Salzberg", 
        "givenName": "S. L.", 
        "id": "sg:person.01223441713.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1186/gb-2014-15-3-r46", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030203790", 
          "https://doi.org/10.1186/gb-2014-15-3-r46"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-015-0821-z", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017653608", 
          "https://doi.org/10.1186/s13059-015-0821-z"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1944-3277-10-18", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004510227", 
          "https://doi.org/10.1186/1944-3277-10-18"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-319-31957-5_8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040434381", 
          "https://doi.org/10.1007/978-3-319-31957-5_8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.3589", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028162909", 
          "https://doi.org/10.1038/nmeth.3589"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12864-015-1419-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021400263", 
          "https://doi.org/10.1186/s12864-015-1419-2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-017-1299-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091875834", 
          "https://doi.org/10.1186/s13059-017-1299-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.3935", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091578237", 
          "https://doi.org/10.1038/nbt.3935"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12915-014-0087-z", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027737035", 
          "https://doi.org/10.1186/s12915-014-0087-z"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.3176", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023901695", 
          "https://doi.org/10.1038/nmeth.3176"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-11-16", 
    "datePublishedReg": "2018-11-16", 
    "description": "False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/s13059-018-1568-0", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.8383234", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2519905", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2529453", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.5053187", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023439", 
        "issn": [
          "1474-760X", 
          "1465-6906"
        ], 
        "name": "Genome Biology", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "19"
      }
    ], 
    "keywords": [
      "disease samples", 
      "significant problem", 
      "KrakenUniq", 
      "mer counts", 
      "count", 
      "false positives", 
      "pathogens", 
      "classification", 
      "false positive identifications", 
      "positives", 
      "recall", 
      "better recall", 
      "identification", 
      "samples", 
      "memory", 
      "coverage", 
      "low abundance", 
      "test dataset", 
      "method", 
      "species", 
      "abundance", 
      "problem", 
      "dataset", 
      "precision", 
      "Kraken", 
      "classifies", 
      "classifier", 
      "little additional memory", 
      "unique k-mers", 
      "algorithm", 
      "metagenomic classifiers", 
      "k-mers", 
      "metagenomic classification", 
      "HyperLogLog", 
      "additional memory", 
      "efficient algorithm"
    ], 
    "name": "KrakenUniq: confident and fast metagenomics classification using unique k-mer counts", 
    "pagination": "198", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1109919786"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s13059-018-1568-0"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "30445993"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s13059-018-1568-0", 
      "https://app.dimensions.ai/details/publication/pub.1109919786"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-10-01T06:44", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/article/article_771.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/s13059-018-1568-0"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0'


 

This table displays all metadata directly associated to this object as RDF triples.

179 TRIPLES      21 PREDICATES      75 URIs      57 LITERALS      11 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s13059-018-1568-0 schema:about N68c05e6ba10d4ded982547aaf1ec415b
2 Na252f4bd3c9f4dc6bdd82fb4e570cc9a
3 Nd857f5e3f8964954a79e27623f6bdd7d
4 Nf82a049cccdc4edcaac9bb69c8fb0cb3
5 anzsrc-for:06
6 anzsrc-for:0604
7 schema:author Nd0dcee5954cd4350a4445d5cbc05e05a
8 schema:citation sg:pub.10.1007/978-3-319-31957-5_8
9 sg:pub.10.1038/nbt.3935
10 sg:pub.10.1038/nmeth.3176
11 sg:pub.10.1038/nmeth.3589
12 sg:pub.10.1186/1944-3277-10-18
13 sg:pub.10.1186/gb-2014-15-3-r46
14 sg:pub.10.1186/s12864-015-1419-2
15 sg:pub.10.1186/s12915-014-0087-z
16 sg:pub.10.1186/s13059-015-0821-z
17 sg:pub.10.1186/s13059-017-1299-7
18 schema:datePublished 2018-11-16
19 schema:datePublishedReg 2018-11-16
20 schema:description False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq.
21 schema:genre article
22 schema:isAccessibleForFree true
23 schema:isPartOf N49acd1e3b9e54eeb8457def80221b53b
24 N877a2161339948a092e7372b2001bc67
25 sg:journal.1023439
26 schema:keywords HyperLogLog
27 Kraken
28 KrakenUniq
29 abundance
30 additional memory
31 algorithm
32 better recall
33 classification
34 classifier
35 classifies
36 count
37 coverage
38 dataset
39 disease samples
40 efficient algorithm
41 false positive identifications
42 false positives
43 identification
44 k-mers
45 little additional memory
46 low abundance
47 memory
48 mer counts
49 metagenomic classification
50 metagenomic classifiers
51 method
52 pathogens
53 positives
54 precision
55 problem
56 recall
57 samples
58 significant problem
59 species
60 test dataset
61 unique k-mers
62 schema:name KrakenUniq: confident and fast metagenomics classification using unique k-mer counts
63 schema:pagination 198
64 schema:productId N1b2181d9440140649dd261fef96127f3
65 N9898f8bc002b4e1fa45c309a17ad276b
66 Nd9ebf7e76d6943bb8a914c800c63d4ed
67 schema:sameAs https://app.dimensions.ai/details/publication/pub.1109919786
68 https://doi.org/10.1186/s13059-018-1568-0
69 schema:sdDatePublished 2022-10-01T06:44
70 schema:sdLicense https://scigraph.springernature.com/explorer/license/
71 schema:sdPublisher N0ec50d442e504ae49af2dce3fcb3c186
72 schema:url https://doi.org/10.1186/s13059-018-1568-0
73 sgo:license sg:explorer/license/
74 sgo:sdDataset articles
75 rdf:type schema:ScholarlyArticle
76 N0ec50d442e504ae49af2dce3fcb3c186 schema:name Springer Nature - SN SciGraph project
77 rdf:type schema:Organization
78 N1b2181d9440140649dd261fef96127f3 schema:name pubmed_id
79 schema:value 30445993
80 rdf:type schema:PropertyValue
81 N49acd1e3b9e54eeb8457def80221b53b schema:issueNumber 1
82 rdf:type schema:PublicationIssue
83 N56955c4f086341a398e3a3555a25aaa8 rdf:first sg:person.011552542733.31
84 rdf:rest N70db4534f49a4e2d8d5bdd5b39a8f636
85 N68c05e6ba10d4ded982547aaf1ec415b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
86 schema:name Infections
87 rdf:type schema:DefinedTerm
88 N70db4534f49a4e2d8d5bdd5b39a8f636 rdf:first sg:person.01223441713.02
89 rdf:rest rdf:nil
90 N877a2161339948a092e7372b2001bc67 schema:volumeNumber 19
91 rdf:type schema:PublicationVolume
92 N9898f8bc002b4e1fa45c309a17ad276b schema:name doi
93 schema:value 10.1186/s13059-018-1568-0
94 rdf:type schema:PropertyValue
95 Na252f4bd3c9f4dc6bdd82fb4e570cc9a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
96 schema:name Software
97 rdf:type schema:DefinedTerm
98 Nd0dcee5954cd4350a4445d5cbc05e05a rdf:first sg:person.01105562620.94
99 rdf:rest N56955c4f086341a398e3a3555a25aaa8
100 Nd857f5e3f8964954a79e27623f6bdd7d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
101 schema:name Metagenomics
102 rdf:type schema:DefinedTerm
103 Nd9ebf7e76d6943bb8a914c800c63d4ed schema:name dimensions_id
104 schema:value pub.1109919786
105 rdf:type schema:PropertyValue
106 Nf82a049cccdc4edcaac9bb69c8fb0cb3 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
107 schema:name Algorithms
108 rdf:type schema:DefinedTerm
109 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
110 schema:name Biological Sciences
111 rdf:type schema:DefinedTerm
112 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
113 schema:name Genetics
114 rdf:type schema:DefinedTerm
115 sg:grant.2519905 http://pending.schema.org/fundedItem sg:pub.10.1186/s13059-018-1568-0
116 rdf:type schema:MonetaryGrant
117 sg:grant.2529453 http://pending.schema.org/fundedItem sg:pub.10.1186/s13059-018-1568-0
118 rdf:type schema:MonetaryGrant
119 sg:grant.5053187 http://pending.schema.org/fundedItem sg:pub.10.1186/s13059-018-1568-0
120 rdf:type schema:MonetaryGrant
121 sg:grant.8383234 http://pending.schema.org/fundedItem sg:pub.10.1186/s13059-018-1568-0
122 rdf:type schema:MonetaryGrant
123 sg:journal.1023439 schema:issn 1465-6906
124 1474-760X
125 schema:name Genome Biology
126 schema:publisher Springer Nature
127 rdf:type schema:Periodical
128 sg:person.01105562620.94 schema:affiliation grid-institutes:grid.21107.35
129 schema:familyName Breitwieser
130 schema:givenName F. P.
131 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01105562620.94
132 rdf:type schema:Person
133 sg:person.011552542733.31 schema:affiliation grid-institutes:grid.21107.35
134 schema:familyName Baker
135 schema:givenName D. N.
136 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011552542733.31
137 rdf:type schema:Person
138 sg:person.01223441713.02 schema:affiliation grid-institutes:grid.21107.35
139 schema:familyName Salzberg
140 schema:givenName S. L.
141 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02
142 rdf:type schema:Person
143 sg:pub.10.1007/978-3-319-31957-5_8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040434381
144 https://doi.org/10.1007/978-3-319-31957-5_8
145 rdf:type schema:CreativeWork
146 sg:pub.10.1038/nbt.3935 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091578237
147 https://doi.org/10.1038/nbt.3935
148 rdf:type schema:CreativeWork
149 sg:pub.10.1038/nmeth.3176 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023901695
150 https://doi.org/10.1038/nmeth.3176
151 rdf:type schema:CreativeWork
152 sg:pub.10.1038/nmeth.3589 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028162909
153 https://doi.org/10.1038/nmeth.3589
154 rdf:type schema:CreativeWork
155 sg:pub.10.1186/1944-3277-10-18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004510227
156 https://doi.org/10.1186/1944-3277-10-18
157 rdf:type schema:CreativeWork
158 sg:pub.10.1186/gb-2014-15-3-r46 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030203790
159 https://doi.org/10.1186/gb-2014-15-3-r46
160 rdf:type schema:CreativeWork
161 sg:pub.10.1186/s12864-015-1419-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021400263
162 https://doi.org/10.1186/s12864-015-1419-2
163 rdf:type schema:CreativeWork
164 sg:pub.10.1186/s12915-014-0087-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1027737035
165 https://doi.org/10.1186/s12915-014-0087-z
166 rdf:type schema:CreativeWork
167 sg:pub.10.1186/s13059-015-0821-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1017653608
168 https://doi.org/10.1186/s13059-015-0821-z
169 rdf:type schema:CreativeWork
170 sg:pub.10.1186/s13059-017-1299-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091875834
171 https://doi.org/10.1186/s13059-017-1299-7
172 rdf:type schema:CreativeWork
173 grid-institutes:grid.21107.35 schema:alternateName Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
174 Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
175 Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
176 schema:name Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
177 Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
178 Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
179 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...