Using cascading Bloom filters to improve the memory usage for de Brujin graphs View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2014-01

AUTHORS

Kamil Salikhov, Gustavo Sacomoto, Gregory Kucherov

ABSTRACT

BACKGROUND: De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. RESULTS: In this work, we show how to reduce the memory required by the data structure of Chikhi and Rizk (WABI'12) that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to their method, with insignificant impact on construction time. At the same time, our experiments showed a better query time compared to the method of Chikhi and Rizk. CONCLUSION: The proposed data structure constitutes, to our knowledge, currently the most efficient practical representation of de Bruijn graphs. More... »

PAGES

2

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1748-7188-9-2

DOI

http://dx.doi.org/10.1186/1748-7188-9-2

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1016211432

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/24565280


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Moscow State University", 
          "id": "https://www.grid.ac/institutes/grid.14476.30", 
          "name": [
            "Lomonosov Moscow State University, Moscow, Russia"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Salikhov", 
        "givenName": "Kamil", 
        "id": "sg:person.0716477076.63", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0716477076.63"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Biometry and Evolutionary Biology Laboratory", 
          "id": "https://www.grid.ac/institutes/grid.462854.9", 
          "name": [
            "INRIA Grenoble Rh\u00f4ne-Alpes, Grenoble, France", 
            "Laboratoire Biom\u00e9trie et Biologie Evolutive, Universit\u00e9 Lyon 1, Lyon, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Sacomoto", 
        "givenName": "Gustavo", 
        "id": "sg:person.0754735253.22", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0754735253.22"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Laboratoire d'Informatique Gaspard-Monge", 
          "id": "https://www.grid.ac/institutes/grid.462940.d", 
          "name": [
            "Department of Computer Science, Ben-Gurion University of the Negev, Be\u2019er Sheva, Israel", 
            "Laboratoire d\u2019Informatique Gaspard Monge, Universit\u00e9 Paris-Est & CNRS, Marne-la-Vall\u00e9e, Paris, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kucherov", 
        "givenName": "Gregory", 
        "id": "sg:person.013163701366.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013163701366.40"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1126/science.277.5331.1453", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003267570"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/rsa.20208", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005483402"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.1121464109", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006108084"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-13-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009770106", 
          "https://doi.org/10.1186/1471-2105-13-1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.171285098", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010138766"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.1883", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015803168", 
          "https://doi.org/10.1038/nbt.1883"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-03351-3_25", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017860647", 
          "https://doi.org/10.1007/978-3-642-03351-3_25"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng.1028", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017955952", 
          "https://doi.org/10.1038/ng.1028"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-33122-0_18", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027417410", 
          "https://doi.org/10.1007/978-3-642-33122-0_18"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btq697", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031150767"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1748-7188-8-22", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1034512490", 
          "https://doi.org/10.1186/1748-7188-8-22"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btt020", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038110157"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btr216", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040954657"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-13-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045096830", 
          "https://doi.org/10.1186/1471-2105-13-5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-13-s6-s1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049942846", 
          "https://doi.org/10.1186/1471-2105-13-s6-s1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.ygeno.2010.03.001", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052838811"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-13-s6-s5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1078553734", 
          "https://doi.org/10.1186/1471-2105-13-s6-s5"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2014-01", 
    "datePublishedReg": "2014-01-01", 
    "description": "BACKGROUND: De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently.\nRESULTS: In this work, we show how to reduce the memory required by the data structure of Chikhi and Rizk (WABI'12) that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to their method, with insignificant impact on construction time. At the same time, our experiments showed a better query time compared to the method of Chikhi and Rizk.\nCONCLUSION: The proposed data structure constitutes, to our knowledge, currently the most efficient practical representation of de Bruijn graphs.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1748-7188-9-2", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.3787435", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1036449", 
        "issn": [
          "1748-7188"
        ], 
        "name": "Algorithms for Molecular Biology", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "9"
      }
    ], 
    "name": "Using cascading Bloom filters to improve the memory usage for de Brujin graphs", 
    "pagination": "2", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "4d3e7fda575eb3a333583d6ecc4015ba8f02b4bb1b9713484a93dc7425a227fd"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "24565280"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "101265088"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1748-7188-9-2"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1016211432"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1748-7188-9-2", 
      "https://app.dimensions.ai/details/publication/pub.1016211432"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T17:31", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8672_00000511.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186%2F1748-7188-9-2"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-9-2'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-9-2'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-9-2'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-9-2'


 

This table displays all metadata directly associated to this object as RDF triples.

152 TRIPLES      21 PREDICATES      46 URIs      21 LITERALS      9 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1748-7188-9-2 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N3c5c33c766e64028a51ba9bc64da4ce6
4 schema:citation sg:pub.10.1007/978-3-642-03351-3_25
5 sg:pub.10.1007/978-3-642-33122-0_18
6 sg:pub.10.1038/nbt.1883
7 sg:pub.10.1038/ng.1028
8 sg:pub.10.1186/1471-2105-13-1
9 sg:pub.10.1186/1471-2105-13-5
10 sg:pub.10.1186/1471-2105-13-s6-s1
11 sg:pub.10.1186/1471-2105-13-s6-s5
12 sg:pub.10.1186/1748-7188-8-22
13 https://doi.org/10.1002/rsa.20208
14 https://doi.org/10.1016/j.ygeno.2010.03.001
15 https://doi.org/10.1073/pnas.1121464109
16 https://doi.org/10.1073/pnas.171285098
17 https://doi.org/10.1093/bioinformatics/btq697
18 https://doi.org/10.1093/bioinformatics/btr216
19 https://doi.org/10.1093/bioinformatics/btt020
20 https://doi.org/10.1126/science.277.5331.1453
21 schema:datePublished 2014-01
22 schema:datePublishedReg 2014-01-01
23 schema:description BACKGROUND: De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. RESULTS: In this work, we show how to reduce the memory required by the data structure of Chikhi and Rizk (WABI'12) that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to their method, with insignificant impact on construction time. At the same time, our experiments showed a better query time compared to the method of Chikhi and Rizk. CONCLUSION: The proposed data structure constitutes, to our knowledge, currently the most efficient practical representation of de Bruijn graphs.
24 schema:genre research_article
25 schema:inLanguage en
26 schema:isAccessibleForFree true
27 schema:isPartOf Ncdce4b1224c248a3a69b99d948c5b2eb
28 Nf31ed2b309aa44b898545b54fbdbff3c
29 sg:journal.1036449
30 schema:name Using cascading Bloom filters to improve the memory usage for de Brujin graphs
31 schema:pagination 2
32 schema:productId N19c30dddcf414f2fa0d4dd4b1763e46b
33 N4df584969a1d4229b212b0410144a25c
34 N7bccb74dd05c47ee8ede6fca0ae97ebe
35 Nce4bde3b8b7e4c999d75e4a44e9abc66
36 Nec74e0f4aba049e69b7b88404fd1bb21
37 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016211432
38 https://doi.org/10.1186/1748-7188-9-2
39 schema:sdDatePublished 2019-04-10T17:31
40 schema:sdLicense https://scigraph.springernature.com/explorer/license/
41 schema:sdPublisher N54a3964497dc4a8baae726a361955c53
42 schema:url http://link.springer.com/10.1186%2F1748-7188-9-2
43 sgo:license sg:explorer/license/
44 sgo:sdDataset articles
45 rdf:type schema:ScholarlyArticle
46 N0510d9567dbf48889803faafa32c6e78 rdf:first sg:person.013163701366.40
47 rdf:rest rdf:nil
48 N19c30dddcf414f2fa0d4dd4b1763e46b schema:name readcube_id
49 schema:value 4d3e7fda575eb3a333583d6ecc4015ba8f02b4bb1b9713484a93dc7425a227fd
50 rdf:type schema:PropertyValue
51 N3c5c33c766e64028a51ba9bc64da4ce6 rdf:first sg:person.0716477076.63
52 rdf:rest Nd5aa68ad19c24b58a9514e2e40331120
53 N4df584969a1d4229b212b0410144a25c schema:name dimensions_id
54 schema:value pub.1016211432
55 rdf:type schema:PropertyValue
56 N54a3964497dc4a8baae726a361955c53 schema:name Springer Nature - SN SciGraph project
57 rdf:type schema:Organization
58 N7bccb74dd05c47ee8ede6fca0ae97ebe schema:name nlm_unique_id
59 schema:value 101265088
60 rdf:type schema:PropertyValue
61 Ncdce4b1224c248a3a69b99d948c5b2eb schema:volumeNumber 9
62 rdf:type schema:PublicationVolume
63 Nce4bde3b8b7e4c999d75e4a44e9abc66 schema:name doi
64 schema:value 10.1186/1748-7188-9-2
65 rdf:type schema:PropertyValue
66 Nd5aa68ad19c24b58a9514e2e40331120 rdf:first sg:person.0754735253.22
67 rdf:rest N0510d9567dbf48889803faafa32c6e78
68 Nec74e0f4aba049e69b7b88404fd1bb21 schema:name pubmed_id
69 schema:value 24565280
70 rdf:type schema:PropertyValue
71 Nf31ed2b309aa44b898545b54fbdbff3c schema:issueNumber 1
72 rdf:type schema:PublicationIssue
73 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
74 schema:name Information and Computing Sciences
75 rdf:type schema:DefinedTerm
76 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
77 schema:name Information Systems
78 rdf:type schema:DefinedTerm
79 sg:grant.3787435 http://pending.schema.org/fundedItem sg:pub.10.1186/1748-7188-9-2
80 rdf:type schema:MonetaryGrant
81 sg:journal.1036449 schema:issn 1748-7188
82 schema:name Algorithms for Molecular Biology
83 rdf:type schema:Periodical
84 sg:person.013163701366.40 schema:affiliation https://www.grid.ac/institutes/grid.462940.d
85 schema:familyName Kucherov
86 schema:givenName Gregory
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013163701366.40
88 rdf:type schema:Person
89 sg:person.0716477076.63 schema:affiliation https://www.grid.ac/institutes/grid.14476.30
90 schema:familyName Salikhov
91 schema:givenName Kamil
92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0716477076.63
93 rdf:type schema:Person
94 sg:person.0754735253.22 schema:affiliation https://www.grid.ac/institutes/grid.462854.9
95 schema:familyName Sacomoto
96 schema:givenName Gustavo
97 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0754735253.22
98 rdf:type schema:Person
99 sg:pub.10.1007/978-3-642-03351-3_25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017860647
100 https://doi.org/10.1007/978-3-642-03351-3_25
101 rdf:type schema:CreativeWork
102 sg:pub.10.1007/978-3-642-33122-0_18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027417410
103 https://doi.org/10.1007/978-3-642-33122-0_18
104 rdf:type schema:CreativeWork
105 sg:pub.10.1038/nbt.1883 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015803168
106 https://doi.org/10.1038/nbt.1883
107 rdf:type schema:CreativeWork
108 sg:pub.10.1038/ng.1028 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017955952
109 https://doi.org/10.1038/ng.1028
110 rdf:type schema:CreativeWork
111 sg:pub.10.1186/1471-2105-13-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009770106
112 https://doi.org/10.1186/1471-2105-13-1
113 rdf:type schema:CreativeWork
114 sg:pub.10.1186/1471-2105-13-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045096830
115 https://doi.org/10.1186/1471-2105-13-5
116 rdf:type schema:CreativeWork
117 sg:pub.10.1186/1471-2105-13-s6-s1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049942846
118 https://doi.org/10.1186/1471-2105-13-s6-s1
119 rdf:type schema:CreativeWork
120 sg:pub.10.1186/1471-2105-13-s6-s5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1078553734
121 https://doi.org/10.1186/1471-2105-13-s6-s5
122 rdf:type schema:CreativeWork
123 sg:pub.10.1186/1748-7188-8-22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034512490
124 https://doi.org/10.1186/1748-7188-8-22
125 rdf:type schema:CreativeWork
126 https://doi.org/10.1002/rsa.20208 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005483402
127 rdf:type schema:CreativeWork
128 https://doi.org/10.1016/j.ygeno.2010.03.001 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052838811
129 rdf:type schema:CreativeWork
130 https://doi.org/10.1073/pnas.1121464109 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006108084
131 rdf:type schema:CreativeWork
132 https://doi.org/10.1073/pnas.171285098 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010138766
133 rdf:type schema:CreativeWork
134 https://doi.org/10.1093/bioinformatics/btq697 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031150767
135 rdf:type schema:CreativeWork
136 https://doi.org/10.1093/bioinformatics/btr216 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040954657
137 rdf:type schema:CreativeWork
138 https://doi.org/10.1093/bioinformatics/btt020 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038110157
139 rdf:type schema:CreativeWork
140 https://doi.org/10.1126/science.277.5331.1453 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003267570
141 rdf:type schema:CreativeWork
142 https://www.grid.ac/institutes/grid.14476.30 schema:alternateName Moscow State University
143 schema:name Lomonosov Moscow State University, Moscow, Russia
144 rdf:type schema:Organization
145 https://www.grid.ac/institutes/grid.462854.9 schema:alternateName Biometry and Evolutionary Biology Laboratory
146 schema:name INRIA Grenoble Rhône-Alpes, Grenoble, France
147 Laboratoire Biométrie et Biologie Evolutive, Université Lyon 1, Lyon, France
148 rdf:type schema:Organization
149 https://www.grid.ac/institutes/grid.462940.d schema:alternateName Laboratoire d'Informatique Gaspard-Monge
150 schema:name Department of Computer Science, Ben-Gurion University of the Negev, Be’er Sheva, Israel
151 Laboratoire d’Informatique Gaspard Monge, Université Paris-Est & CNRS, Marne-la-Vallée, Paris, France
152 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...