KrakenUniq: confident and fast metagenomics classification using unique k-mer counts View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-11-16

AUTHORS

F. P. Breitwieser, D. N. Baker, S. L. Salzberg

ABSTRACT

False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq. More... »

PAGES

198

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0

DOI

http://dx.doi.org/10.1186/s13059-018-1568-0

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1109919786

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/30445993


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Infections", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Breitwieser", 
        "givenName": "F. P.", 
        "id": "sg:person.01105562620.94", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01105562620.94"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA", 
            "Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Baker", 
        "givenName": "D. N.", 
        "id": "sg:person.011552542733.31", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011552542733.31"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.21107.35", 
          "name": [
            "Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA", 
            "Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA", 
            "Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Salzberg", 
        "givenName": "S. L.", 
        "id": "sg:person.01223441713.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1186/s12915-014-0087-z", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027737035", 
          "https://doi.org/10.1186/s12915-014-0087-z"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-319-31957-5_8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040434381", 
          "https://doi.org/10.1007/978-3-319-31957-5_8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2014-15-3-r46", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030203790", 
          "https://doi.org/10.1186/gb-2014-15-3-r46"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.3935", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091578237", 
          "https://doi.org/10.1038/nbt.3935"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.3589", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028162909", 
          "https://doi.org/10.1038/nmeth.3589"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12864-015-1419-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021400263", 
          "https://doi.org/10.1186/s12864-015-1419-2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.3176", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023901695", 
          "https://doi.org/10.1038/nmeth.3176"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-017-1299-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091875834", 
          "https://doi.org/10.1186/s13059-017-1299-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1944-3277-10-18", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004510227", 
          "https://doi.org/10.1186/1944-3277-10-18"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-015-0821-z", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017653608", 
          "https://doi.org/10.1186/s13059-015-0821-z"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-11-16", 
    "datePublishedReg": "2018-11-16", 
    "description": "False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/s13059-018-1568-0", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2519905", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.5053187", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.8383234", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2529453", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023439", 
        "issn": [
          "1474-760X", 
          "1465-6906"
        ], 
        "name": "Genome Biology", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "19"
      }
    ], 
    "keywords": [
      "disease samples", 
      "significant problem", 
      "KrakenUniq", 
      "mer counts", 
      "count", 
      "false positives", 
      "pathogens", 
      "classification", 
      "false positive identifications", 
      "positives", 
      "recall", 
      "better recall", 
      "identification", 
      "samples", 
      "memory", 
      "coverage", 
      "low abundance", 
      "test dataset", 
      "method", 
      "species", 
      "abundance", 
      "problem", 
      "dataset", 
      "precision", 
      "Kraken", 
      "classifies", 
      "classifier", 
      "little additional memory", 
      "unique k-mers", 
      "algorithm", 
      "metagenomic classifiers", 
      "k-mers", 
      "metagenomic classification", 
      "HyperLogLog", 
      "additional memory", 
      "efficient algorithm"
    ], 
    "name": "KrakenUniq: confident and fast metagenomics classification using unique k-mer counts", 
    "pagination": "198", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1109919786"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s13059-018-1568-0"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "30445993"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s13059-018-1568-0", 
      "https://app.dimensions.ai/details/publication/pub.1109919786"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-11-24T21:03", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221124/entities/gbq_results/article/article_761.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/s13059-018-1568-0"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13059-018-1568-0'


 

This table displays all metadata directly associated to this object as RDF triples.

179 TRIPLES      21 PREDICATES      75 URIs      57 LITERALS      11 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s13059-018-1568-0 schema:about N64888e92967043bd8314c82fd48d2d6d
2 N6fa07ca272e243898799cc8bf089543f
3 Nb3511f6c85b54cd08d27a9d3017cbb09
4 Nfd5b79101def4dd79d5c1d489ab67327
5 anzsrc-for:06
6 anzsrc-for:0604
7 schema:author Nf2016b1aa66843e2bfcf6fe64a3bfd06
8 schema:citation sg:pub.10.1007/978-3-319-31957-5_8
9 sg:pub.10.1038/nbt.3935
10 sg:pub.10.1038/nmeth.3176
11 sg:pub.10.1038/nmeth.3589
12 sg:pub.10.1186/1944-3277-10-18
13 sg:pub.10.1186/gb-2014-15-3-r46
14 sg:pub.10.1186/s12864-015-1419-2
15 sg:pub.10.1186/s12915-014-0087-z
16 sg:pub.10.1186/s13059-015-0821-z
17 sg:pub.10.1186/s13059-017-1299-7
18 schema:datePublished 2018-11-16
19 schema:datePublishedReg 2018-11-16
20 schema:description False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq.
21 schema:genre article
22 schema:isAccessibleForFree true
23 schema:isPartOf N5bb7c22ca184486d8bccb070f3e82666
24 N74cf51ba0c9146ad82f50256e7726326
25 sg:journal.1023439
26 schema:keywords HyperLogLog
27 Kraken
28 KrakenUniq
29 abundance
30 additional memory
31 algorithm
32 better recall
33 classification
34 classifier
35 classifies
36 count
37 coverage
38 dataset
39 disease samples
40 efficient algorithm
41 false positive identifications
42 false positives
43 identification
44 k-mers
45 little additional memory
46 low abundance
47 memory
48 mer counts
49 metagenomic classification
50 metagenomic classifiers
51 method
52 pathogens
53 positives
54 precision
55 problem
56 recall
57 samples
58 significant problem
59 species
60 test dataset
61 unique k-mers
62 schema:name KrakenUniq: confident and fast metagenomics classification using unique k-mer counts
63 schema:pagination 198
64 schema:productId N71864871633041608f12613565f8301b
65 Naa4ae33d427d4835805527b733174dcd
66 Ndafd390f213649df82370157e3906736
67 schema:sameAs https://app.dimensions.ai/details/publication/pub.1109919786
68 https://doi.org/10.1186/s13059-018-1568-0
69 schema:sdDatePublished 2022-11-24T21:03
70 schema:sdLicense https://scigraph.springernature.com/explorer/license/
71 schema:sdPublisher Nf945b0205d9c4251ad0584ccaf6d37fb
72 schema:url https://doi.org/10.1186/s13059-018-1568-0
73 sgo:license sg:explorer/license/
74 sgo:sdDataset articles
75 rdf:type schema:ScholarlyArticle
76 N5af9f5320fa64e4299286267b9da6427 rdf:first sg:person.011552542733.31
77 rdf:rest Ncd0cedb86fef4e5dbdcd86627d456731
78 N5bb7c22ca184486d8bccb070f3e82666 schema:issueNumber 1
79 rdf:type schema:PublicationIssue
80 N64888e92967043bd8314c82fd48d2d6d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
81 schema:name Infections
82 rdf:type schema:DefinedTerm
83 N6fa07ca272e243898799cc8bf089543f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
84 schema:name Metagenomics
85 rdf:type schema:DefinedTerm
86 N71864871633041608f12613565f8301b schema:name dimensions_id
87 schema:value pub.1109919786
88 rdf:type schema:PropertyValue
89 N74cf51ba0c9146ad82f50256e7726326 schema:volumeNumber 19
90 rdf:type schema:PublicationVolume
91 Naa4ae33d427d4835805527b733174dcd schema:name pubmed_id
92 schema:value 30445993
93 rdf:type schema:PropertyValue
94 Nb3511f6c85b54cd08d27a9d3017cbb09 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
95 schema:name Algorithms
96 rdf:type schema:DefinedTerm
97 Ncd0cedb86fef4e5dbdcd86627d456731 rdf:first sg:person.01223441713.02
98 rdf:rest rdf:nil
99 Ndafd390f213649df82370157e3906736 schema:name doi
100 schema:value 10.1186/s13059-018-1568-0
101 rdf:type schema:PropertyValue
102 Nf2016b1aa66843e2bfcf6fe64a3bfd06 rdf:first sg:person.01105562620.94
103 rdf:rest N5af9f5320fa64e4299286267b9da6427
104 Nf945b0205d9c4251ad0584ccaf6d37fb schema:name Springer Nature - SN SciGraph project
105 rdf:type schema:Organization
106 Nfd5b79101def4dd79d5c1d489ab67327 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
107 schema:name Software
108 rdf:type schema:DefinedTerm
109 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
110 schema:name Biological Sciences
111 rdf:type schema:DefinedTerm
112 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
113 schema:name Genetics
114 rdf:type schema:DefinedTerm
115 sg:grant.2519905 http://pending.schema.org/fundedItem sg:pub.10.1186/s13059-018-1568-0
116 rdf:type schema:MonetaryGrant
117 sg:grant.2529453 http://pending.schema.org/fundedItem sg:pub.10.1186/s13059-018-1568-0
118 rdf:type schema:MonetaryGrant
119 sg:grant.5053187 http://pending.schema.org/fundedItem sg:pub.10.1186/s13059-018-1568-0
120 rdf:type schema:MonetaryGrant
121 sg:grant.8383234 http://pending.schema.org/fundedItem sg:pub.10.1186/s13059-018-1568-0
122 rdf:type schema:MonetaryGrant
123 sg:journal.1023439 schema:issn 1465-6906
124 1474-760X
125 schema:name Genome Biology
126 schema:publisher Springer Nature
127 rdf:type schema:Periodical
128 sg:person.01105562620.94 schema:affiliation grid-institutes:grid.21107.35
129 schema:familyName Breitwieser
130 schema:givenName F. P.
131 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01105562620.94
132 rdf:type schema:Person
133 sg:person.011552542733.31 schema:affiliation grid-institutes:grid.21107.35
134 schema:familyName Baker
135 schema:givenName D. N.
136 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011552542733.31
137 rdf:type schema:Person
138 sg:person.01223441713.02 schema:affiliation grid-institutes:grid.21107.35
139 schema:familyName Salzberg
140 schema:givenName S. L.
141 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02
142 rdf:type schema:Person
143 sg:pub.10.1007/978-3-319-31957-5_8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040434381
144 https://doi.org/10.1007/978-3-319-31957-5_8
145 rdf:type schema:CreativeWork
146 sg:pub.10.1038/nbt.3935 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091578237
147 https://doi.org/10.1038/nbt.3935
148 rdf:type schema:CreativeWork
149 sg:pub.10.1038/nmeth.3176 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023901695
150 https://doi.org/10.1038/nmeth.3176
151 rdf:type schema:CreativeWork
152 sg:pub.10.1038/nmeth.3589 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028162909
153 https://doi.org/10.1038/nmeth.3589
154 rdf:type schema:CreativeWork
155 sg:pub.10.1186/1944-3277-10-18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004510227
156 https://doi.org/10.1186/1944-3277-10-18
157 rdf:type schema:CreativeWork
158 sg:pub.10.1186/gb-2014-15-3-r46 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030203790
159 https://doi.org/10.1186/gb-2014-15-3-r46
160 rdf:type schema:CreativeWork
161 sg:pub.10.1186/s12864-015-1419-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021400263
162 https://doi.org/10.1186/s12864-015-1419-2
163 rdf:type schema:CreativeWork
164 sg:pub.10.1186/s12915-014-0087-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1027737035
165 https://doi.org/10.1186/s12915-014-0087-z
166 rdf:type schema:CreativeWork
167 sg:pub.10.1186/s13059-015-0821-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1017653608
168 https://doi.org/10.1186/s13059-015-0821-z
169 rdf:type schema:CreativeWork
170 sg:pub.10.1186/s13059-017-1299-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091875834
171 https://doi.org/10.1186/s13059-017-1299-7
172 rdf:type schema:CreativeWork
173 grid-institutes:grid.21107.35 schema:alternateName Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
174 Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
175 Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
176 schema:name Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
177 Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
178 Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
179 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...