dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2017-07-25

AUTHORS

Matthew R Olm, Christopher T Brown, Brandon Brooks, Jillian F Banfield

ABSTRACT

The number of microbial genomes sequenced each year is expanding rapidly, in part due to genome-resolved metagenomic studies that routinely recover hundreds of draft-quality genomes. Rapid algorithms have been developed to comprehensively compare large genome sets, but they are not accurate with draft-quality genomes. Here we present dRep, a program that reduces the computational time for pairwise genome comparisons by sequentially applying a fast, inaccurate estimation of genome distance, and a slow, accurate measure of average nucleotide identity. dRep achieves a 28 × increase in speed with perfect recall and precision when benchmarked against previously developed algorithms. We demonstrate the use of dRep for genome recovery from time-series datasets. Each metagenome was assembled separately, and dRep was used to identify groups of essentially identical genomes and select the best genome from each replicate set. This resulted in recovery of significantly more and higher-quality genomes compared to the set recovered using co-assembly. More... »

PAGES

2864-2868

Identifiers

URI

http://scigraph.springernature.com/pub.10.1038/ismej.2017.126

DOI

http://dx.doi.org/10.1038/ismej.2017.126

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1090891867

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/28742071


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Bacteria", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome, Bacterial", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA", 
          "id": "http://www.grid.ac/institutes/grid.47840.3f", 
          "name": [
            "Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Olm", 
        "givenName": "Matthew R", 
        "id": "sg:person.010531110205.41", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531110205.41"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA", 
          "id": "http://www.grid.ac/institutes/grid.47840.3f", 
          "name": [
            "Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brown", 
        "givenName": "Christopher T", 
        "id": "sg:person.01236044570.46", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01236044570.46"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA", 
          "id": "http://www.grid.ac/institutes/grid.47840.3f", 
          "name": [
            "Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brooks", 
        "givenName": "Brandon", 
        "id": "sg:person.01234072512.13", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01234072512.13"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Earth and Planetary Science, University of California, Berkeley, CA, USA", 
          "id": "http://www.grid.ac/institutes/grid.47840.3f", 
          "name": [
            "Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA", 
            "Department of Earth and Planetary Science, University of California, Berkeley, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Banfield", 
        "givenName": "Jillian F", 
        "id": "sg:person.01350542775.47", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01350542775.47"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1186/s13059-016-0997-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050687712", 
          "https://doi.org/10.1186/s13059-016-0997-x"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature02340", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023089166", 
          "https://doi.org/10.1038/nature02340"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmicrobiol.2016.24", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052776941", 
          "https://doi.org/10.1038/nmicrobiol.2016.24"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.4458", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1092065319", 
          "https://doi.org/10.1038/nmeth.4458"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s40168-017-0270-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1085135267", 
          "https://doi.org/10.1186/s40168-017-0270-x"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ismej.2016.83", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008606938", 
          "https://doi.org/10.1038/ismej.2016.83"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ismej.2015.241", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032868433", 
          "https://doi.org/10.1038/ismej.2015.241"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2017-07-25", 
    "datePublishedReg": "2017-07-25", 
    "description": "The number of microbial genomes sequenced each year is expanding rapidly, in part due to genome-resolved metagenomic studies that routinely recover hundreds of draft-quality genomes. Rapid algorithms have been developed to comprehensively compare large genome sets, but they are not accurate with draft-quality genomes. Here we present dRep, a program that reduces the computational time for pairwise genome comparisons by sequentially applying a fast, inaccurate estimation of genome distance, and a slow, accurate measure of average nucleotide identity. dRep achieves a 28 \u00d7 increase in speed with perfect recall and precision when benchmarked against previously developed algorithms. We demonstrate the use of dRep for genome recovery from time-series datasets. Each metagenome was assembled separately, and dRep was used to identify groups of essentially identical genomes and select the best genome from each replicate set. This resulted in recovery of significantly more and higher-quality genomes compared to the set recovered using co-assembly.", 
    "genre": "article", 
    "id": "sg:pub.10.1038/ismej.2017.126", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2459162", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.3000627", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1038436", 
        "issn": [
          "1751-7362", 
          "1751-7370"
        ], 
        "name": "The ISME Journal: Multidisciplinary Journal of Microbial Ecology", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "12", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "11"
      }
    ], 
    "keywords": [
      "draft-quality genomes", 
      "time-series datasets", 
      "computational time", 
      "pairwise genome comparisons", 
      "perfect recall", 
      "rapid algorithm", 
      "DREP", 
      "algorithm", 
      "best genome", 
      "inaccurate estimation", 
      "set", 
      "datasets", 
      "tool", 
      "estimation", 
      "high-quality genomes", 
      "recall", 
      "precision", 
      "speed", 
      "hundreds", 
      "genome comparison", 
      "genome recovery", 
      "metagenomic studies", 
      "microbial genomes", 
      "accurate measure", 
      "program", 
      "number", 
      "distance", 
      "time", 
      "use", 
      "comparison", 
      "part", 
      "genome sets", 
      "measures", 
      "metagenomes", 
      "genome distance", 
      "identity", 
      "recovery", 
      "years", 
      "study", 
      "genome", 
      "group", 
      "increase", 
      "genomic comparisons", 
      "replicates", 
      "identical genomes", 
      "average nucleotide identity", 
      "nucleotide identity", 
      "large genome sets", 
      "use of dRep", 
      "accurate genomic comparisons", 
      "improved genome recovery"
    ], 
    "name": "dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication", 
    "pagination": "2864-2868", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1090891867"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1038/ismej.2017.126"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "28742071"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1038/ismej.2017.126", 
      "https://app.dimensions.ai/details/publication/pub.1090891867"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2021-12-01T19:37", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211201/entities/gbq_results/article/article_725.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1038/ismej.2017.126"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1038/ismej.2017.126'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1038/ismej.2017.126'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1038/ismej.2017.126'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1038/ismej.2017.126'


 

This table displays all metadata directly associated to this object as RDF triples.

193 TRIPLES      22 PREDICATES      90 URIs      75 LITERALS      13 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1038/ismej.2017.126 schema:about N134c1c49b76b4ce9b78ac49f8da3da57
2 N55cb1957029b4725af26014c2aea1f18
3 N5a6087ad43504b718b3bd8c4fd9fc90a
4 N5ef619f7b4b946c2be5a8b889b73a5e2
5 N9a8b16dacc744a64acad87ff35745dc4
6 Nd44cdebd2b5b48fd8d3848f44febc7c8
7 anzsrc-for:06
8 anzsrc-for:0604
9 schema:author Nb194cd05275545e7a9ac8152ce0cf9f8
10 schema:citation sg:pub.10.1038/ismej.2015.241
11 sg:pub.10.1038/ismej.2016.83
12 sg:pub.10.1038/nature02340
13 sg:pub.10.1038/nmeth.4458
14 sg:pub.10.1038/nmicrobiol.2016.24
15 sg:pub.10.1186/s13059-016-0997-x
16 sg:pub.10.1186/s40168-017-0270-x
17 schema:datePublished 2017-07-25
18 schema:datePublishedReg 2017-07-25
19 schema:description The number of microbial genomes sequenced each year is expanding rapidly, in part due to genome-resolved metagenomic studies that routinely recover hundreds of draft-quality genomes. Rapid algorithms have been developed to comprehensively compare large genome sets, but they are not accurate with draft-quality genomes. Here we present dRep, a program that reduces the computational time for pairwise genome comparisons by sequentially applying a fast, inaccurate estimation of genome distance, and a slow, accurate measure of average nucleotide identity. dRep achieves a 28 × increase in speed with perfect recall and precision when benchmarked against previously developed algorithms. We demonstrate the use of dRep for genome recovery from time-series datasets. Each metagenome was assembled separately, and dRep was used to identify groups of essentially identical genomes and select the best genome from each replicate set. This resulted in recovery of significantly more and higher-quality genomes compared to the set recovered using co-assembly.
20 schema:genre article
21 schema:inLanguage en
22 schema:isAccessibleForFree true
23 schema:isPartOf N3739c3848ec546c88f11a39bee51822d
24 Nd92d9b1197754cea827b7c8249ceccfa
25 sg:journal.1038436
26 schema:keywords DREP
27 accurate genomic comparisons
28 accurate measure
29 algorithm
30 average nucleotide identity
31 best genome
32 comparison
33 computational time
34 datasets
35 distance
36 draft-quality genomes
37 estimation
38 genome
39 genome comparison
40 genome distance
41 genome recovery
42 genome sets
43 genomic comparisons
44 group
45 high-quality genomes
46 hundreds
47 identical genomes
48 identity
49 improved genome recovery
50 inaccurate estimation
51 increase
52 large genome sets
53 measures
54 metagenomes
55 metagenomic studies
56 microbial genomes
57 nucleotide identity
58 number
59 pairwise genome comparisons
60 part
61 perfect recall
62 precision
63 program
64 rapid algorithm
65 recall
66 recovery
67 replicates
68 set
69 speed
70 study
71 time
72 time-series datasets
73 tool
74 use
75 use of dRep
76 years
77 schema:name dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication
78 schema:pagination 2864-2868
79 schema:productId N4c7103fe679d43f9a13562bc943a3897
80 N6c5e846a72e342768c7589f2f4c5fde9
81 Nb962565eff8c4c128e8ce3b435dcba88
82 schema:sameAs https://app.dimensions.ai/details/publication/pub.1090891867
83 https://doi.org/10.1038/ismej.2017.126
84 schema:sdDatePublished 2021-12-01T19:37
85 schema:sdLicense https://scigraph.springernature.com/explorer/license/
86 schema:sdPublisher N257b7fef16444c5cb0475c42586b2c8a
87 schema:url https://doi.org/10.1038/ismej.2017.126
88 sgo:license sg:explorer/license/
89 sgo:sdDataset articles
90 rdf:type schema:ScholarlyArticle
91 N134c1c49b76b4ce9b78ac49f8da3da57 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
92 schema:name Metagenome
93 rdf:type schema:DefinedTerm
94 N257b7fef16444c5cb0475c42586b2c8a schema:name Springer Nature - SN SciGraph project
95 rdf:type schema:Organization
96 N3202fd0bd75d49ffb518d87ec7a157b7 rdf:first sg:person.01236044570.46
97 rdf:rest Ndf20156f6edb4f99b587e1d53ecd98cf
98 N3739c3848ec546c88f11a39bee51822d schema:issueNumber 12
99 rdf:type schema:PublicationIssue
100 N4c7103fe679d43f9a13562bc943a3897 schema:name pubmed_id
101 schema:value 28742071
102 rdf:type schema:PropertyValue
103 N55cb1957029b4725af26014c2aea1f18 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
104 schema:name Bacteria
105 rdf:type schema:DefinedTerm
106 N5a6087ad43504b718b3bd8c4fd9fc90a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
107 schema:name Genome, Bacterial
108 rdf:type schema:DefinedTerm
109 N5ef619f7b4b946c2be5a8b889b73a5e2 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
110 schema:name Metagenomics
111 rdf:type schema:DefinedTerm
112 N6c5e846a72e342768c7589f2f4c5fde9 schema:name dimensions_id
113 schema:value pub.1090891867
114 rdf:type schema:PropertyValue
115 N8b1ecf2280234c2da94af5bd6e8f8a65 rdf:first sg:person.01350542775.47
116 rdf:rest rdf:nil
117 N9a8b16dacc744a64acad87ff35745dc4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
118 schema:name Algorithms
119 rdf:type schema:DefinedTerm
120 Nb194cd05275545e7a9ac8152ce0cf9f8 rdf:first sg:person.010531110205.41
121 rdf:rest N3202fd0bd75d49ffb518d87ec7a157b7
122 Nb962565eff8c4c128e8ce3b435dcba88 schema:name doi
123 schema:value 10.1038/ismej.2017.126
124 rdf:type schema:PropertyValue
125 Nd44cdebd2b5b48fd8d3848f44febc7c8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
126 schema:name Software
127 rdf:type schema:DefinedTerm
128 Nd92d9b1197754cea827b7c8249ceccfa schema:volumeNumber 11
129 rdf:type schema:PublicationVolume
130 Ndf20156f6edb4f99b587e1d53ecd98cf rdf:first sg:person.01234072512.13
131 rdf:rest N8b1ecf2280234c2da94af5bd6e8f8a65
132 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
133 schema:name Biological Sciences
134 rdf:type schema:DefinedTerm
135 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
136 schema:name Genetics
137 rdf:type schema:DefinedTerm
138 sg:grant.2459162 http://pending.schema.org/fundedItem sg:pub.10.1038/ismej.2017.126
139 rdf:type schema:MonetaryGrant
140 sg:grant.3000627 http://pending.schema.org/fundedItem sg:pub.10.1038/ismej.2017.126
141 rdf:type schema:MonetaryGrant
142 sg:journal.1038436 schema:issn 1751-7362
143 1751-7370
144 schema:name The ISME Journal: Multidisciplinary Journal of Microbial Ecology
145 schema:publisher Springer Nature
146 rdf:type schema:Periodical
147 sg:person.010531110205.41 schema:affiliation grid-institutes:grid.47840.3f
148 schema:familyName Olm
149 schema:givenName Matthew R
150 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531110205.41
151 rdf:type schema:Person
152 sg:person.01234072512.13 schema:affiliation grid-institutes:grid.47840.3f
153 schema:familyName Brooks
154 schema:givenName Brandon
155 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01234072512.13
156 rdf:type schema:Person
157 sg:person.01236044570.46 schema:affiliation grid-institutes:grid.47840.3f
158 schema:familyName Brown
159 schema:givenName Christopher T
160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01236044570.46
161 rdf:type schema:Person
162 sg:person.01350542775.47 schema:affiliation grid-institutes:grid.47840.3f
163 schema:familyName Banfield
164 schema:givenName Jillian F
165 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01350542775.47
166 rdf:type schema:Person
167 sg:pub.10.1038/ismej.2015.241 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032868433
168 https://doi.org/10.1038/ismej.2015.241
169 rdf:type schema:CreativeWork
170 sg:pub.10.1038/ismej.2016.83 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008606938
171 https://doi.org/10.1038/ismej.2016.83
172 rdf:type schema:CreativeWork
173 sg:pub.10.1038/nature02340 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023089166
174 https://doi.org/10.1038/nature02340
175 rdf:type schema:CreativeWork
176 sg:pub.10.1038/nmeth.4458 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092065319
177 https://doi.org/10.1038/nmeth.4458
178 rdf:type schema:CreativeWork
179 sg:pub.10.1038/nmicrobiol.2016.24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052776941
180 https://doi.org/10.1038/nmicrobiol.2016.24
181 rdf:type schema:CreativeWork
182 sg:pub.10.1186/s13059-016-0997-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1050687712
183 https://doi.org/10.1186/s13059-016-0997-x
184 rdf:type schema:CreativeWork
185 sg:pub.10.1186/s40168-017-0270-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1085135267
186 https://doi.org/10.1186/s40168-017-0270-x
187 rdf:type schema:CreativeWork
188 grid-institutes:grid.47840.3f schema:alternateName Department of Earth and Planetary Science, University of California, Berkeley, CA, USA
189 Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
190 schema:name Department of Earth and Planetary Science, University of California, Berkeley, CA, USA
191 Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA
192 Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
193 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...