Genomic Data Clustering on FPGAs for Compression View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2017

AUTHORS

Enrico Petraglio , Rick Wertenbroek , Flavio Capitao , Nicolas Guex , Christian Iseli , Yann Thoma

ABSTRACT

Current sequencing machine technology generates very large and redundant volumes of genomic data for each biological sample. Today data and associated metadata are formatted in very large text file assemblies called FASTQ carrying the information of billions of genome fragments referred to as “reads” and composed of strings of nucleotide bases with lengths in the range of a few tenths to a few hundreds bases. Compressing such data is definitely required in order to manage the sheer amount of data soon to be generated. Doing so implies finding redundant information in the raw sequences. While most of it can be mapped onto the human reference genome and fits well for compression, about 10% of it usually does not map to any reference [1]. For these orphan sequences, finding redundancy will help compression. Doing so requires clustering these reads, a very time consuming process. Within this context this paper presents a FPGA implementation of a clustering algorithm for genomic reads, implemented on Pico Computing EX-700 AC-510 hardware, offering more than a \(1000\times \) speed up over a CPU implementation while reducing power consumption by a 700 factor. More... »

PAGES

229-240

Book

TITLE

Applied Reconfigurable Computing

ISBN

978-3-319-56257-5
978-3-319-56258-2

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-56258-2_20

DOI

http://dx.doi.org/10.1007/978-3-319-56258-2_20

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1084682751


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "School of Management and Engineering Vaud", 
          "id": "https://www.grid.ac/institutes/grid.435142.5", 
          "name": [
            "REDS Institute, HEIG-VD School of Business and Engineering Vaud HES-SO University of Applied Sciences Western Switzerland Yverdon-les-Bains Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Petraglio", 
        "givenName": "Enrico", 
        "id": "sg:person.012206665253.47", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012206665253.47"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "School of Management and Engineering Vaud", 
          "id": "https://www.grid.ac/institutes/grid.435142.5", 
          "name": [
            "REDS Institute, HEIG-VD School of Business and Engineering Vaud HES-SO University of Applied Sciences Western Switzerland Yverdon-les-Bains Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wertenbroek", 
        "givenName": "Rick", 
        "id": "sg:person.011411304653.19", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011411304653.19"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "School of Management and Engineering Vaud", 
          "id": "https://www.grid.ac/institutes/grid.435142.5", 
          "name": [
            "REDS Institute, HEIG-VD School of Business and Engineering Vaud HES-SO University of Applied Sciences Western Switzerland Yverdon-les-Bains Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Capitao", 
        "givenName": "Flavio", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Vital-IT SIB Swiss Institute of Bioinformatics Lausanne Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Guex", 
        "givenName": "Nicolas", 
        "id": "sg:person.01304017634.53", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01304017634.53"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Vital-IT SIB Swiss Institute of Bioinformatics Lausanne Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Iseli", 
        "givenName": "Christian", 
        "id": "sg:person.01013522507.49", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01013522507.49"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "School of Management and Engineering Vaud", 
          "id": "https://www.grid.ac/institutes/grid.435142.5", 
          "name": [
            "REDS Institute, HEIG-VD School of Business and Engineering Vaud HES-SO University of Applied Sciences Western Switzerland Yverdon-les-Bains Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Thoma", 
        "givenName": "Yann", 
        "id": "sg:person.013201261375.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013201261375.17"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/nar/gkr1124", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002413372"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.neunet.2009.08.007", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007737944"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pbio.1002195", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011952988"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.patrec.2009.09.011", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019119600"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.114819.110", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025065244"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/bts173", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029888576"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btr014", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053036888"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/ahs.2011.5963944", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095039885"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/fpl.2013.6645501", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095145814"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2017", 
    "datePublishedReg": "2017-01-01", 
    "description": "Current sequencing machine technology generates very large and redundant volumes of genomic data for each biological sample. Today data and associated metadata are formatted in very large text file assemblies called FASTQ carrying the information of billions of genome fragments referred to as \u201creads\u201d and composed of strings of nucleotide bases with lengths in the range of a few tenths to a few hundreds bases. Compressing such data is definitely required in order to manage the sheer amount of data soon to be generated. Doing so implies finding redundant information in the raw sequences. While most of it can be mapped onto the human reference genome and fits well for compression, about 10% of it usually does not map to any reference [1]. For these orphan sequences, finding redundancy will help compression. Doing so requires clustering these reads, a very time consuming process. Within this context this paper presents a FPGA implementation of a clustering algorithm for genomic reads, implemented on Pico Computing EX-700 AC-510 hardware, offering more than a \\(1000\\times \\) speed up over a CPU implementation while reducing power consumption by a 700 factor.", 
    "editor": [
      {
        "familyName": "Wong", 
        "givenName": "Stephan", 
        "type": "Person"
      }, 
      {
        "familyName": "Beck", 
        "givenName": "Antonio Carlos", 
        "type": "Person"
      }, 
      {
        "familyName": "Bertels", 
        "givenName": "Koen", 
        "type": "Person"
      }, 
      {
        "familyName": "Carro", 
        "givenName": "Luigi", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-56258-2_20", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-319-56257-5", 
        "978-3-319-56258-2"
      ], 
      "name": "Applied Reconfigurable Computing", 
      "type": "Book"
    }, 
    "name": "Genomic Data Clustering on FPGAs for Compression", 
    "pagination": "229-240", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-56258-2_20"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "24fccb0e6d661786f2f0cecb9d142ee1c25f689e381513d3862d711e87848ca0"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1084682751"
        ]
      }
    ], 
    "publisher": {
      "location": "Cham", 
      "name": "Springer International Publishing", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-56258-2_20", 
      "https://app.dimensions.ai/details/publication/pub.1084682751"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T21:11", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8690_00000331.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/978-3-319-56258-2_20"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-56258-2_20'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-56258-2_20'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-56258-2_20'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-56258-2_20'


 

This table displays all metadata directly associated to this object as RDF triples.

145 TRIPLES      23 PREDICATES      36 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-56258-2_20 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N19505bc8b0204272a0da65e99446da20
4 schema:citation https://doi.org/10.1016/j.neunet.2009.08.007
5 https://doi.org/10.1016/j.patrec.2009.09.011
6 https://doi.org/10.1093/bioinformatics/btr014
7 https://doi.org/10.1093/bioinformatics/bts173
8 https://doi.org/10.1093/nar/gkr1124
9 https://doi.org/10.1101/gr.114819.110
10 https://doi.org/10.1109/ahs.2011.5963944
11 https://doi.org/10.1109/fpl.2013.6645501
12 https://doi.org/10.1371/journal.pbio.1002195
13 schema:datePublished 2017
14 schema:datePublishedReg 2017-01-01
15 schema:description Current sequencing machine technology generates very large and redundant volumes of genomic data for each biological sample. Today data and associated metadata are formatted in very large text file assemblies called FASTQ carrying the information of billions of genome fragments referred to as “reads” and composed of strings of nucleotide bases with lengths in the range of a few tenths to a few hundreds bases. Compressing such data is definitely required in order to manage the sheer amount of data soon to be generated. Doing so implies finding redundant information in the raw sequences. While most of it can be mapped onto the human reference genome and fits well for compression, about 10% of it usually does not map to any reference [1]. For these orphan sequences, finding redundancy will help compression. Doing so requires clustering these reads, a very time consuming process. Within this context this paper presents a FPGA implementation of a clustering algorithm for genomic reads, implemented on Pico Computing EX-700 AC-510 hardware, offering more than a \(1000\times \) speed up over a CPU implementation while reducing power consumption by a 700 factor.
16 schema:editor Nc429d3578015468196caaadbfd051e91
17 schema:genre chapter
18 schema:inLanguage en
19 schema:isAccessibleForFree false
20 schema:isPartOf Nba6bbff58db848f0be20721d6a562219
21 schema:name Genomic Data Clustering on FPGAs for Compression
22 schema:pagination 229-240
23 schema:productId N3299a525292344d3bd436d852661cd3b
24 Nb2bf4b0e4d88400285ba2cc95b25922c
25 Nf0b1787b2afa4c988ff39526da1eac14
26 schema:publisher N57b294af28444318bf36e14a852be8c5
27 schema:sameAs https://app.dimensions.ai/details/publication/pub.1084682751
28 https://doi.org/10.1007/978-3-319-56258-2_20
29 schema:sdDatePublished 2019-04-15T21:11
30 schema:sdLicense https://scigraph.springernature.com/explorer/license/
31 schema:sdPublisher N93d3fc5194a146b184ca3311cde33f43
32 schema:url http://link.springer.com/10.1007/978-3-319-56258-2_20
33 sgo:license sg:explorer/license/
34 sgo:sdDataset chapters
35 rdf:type schema:Chapter
36 N002d0a4b8047444fb56649186345b120 schema:affiliation https://www.grid.ac/institutes/grid.435142.5
37 schema:familyName Capitao
38 schema:givenName Flavio
39 rdf:type schema:Person
40 N020c6d8067ee4e34859a6275db4bc997 schema:familyName Bertels
41 schema:givenName Koen
42 rdf:type schema:Person
43 N154fa96eee61421cbcf9ae23c6ba1b8f schema:familyName Beck
44 schema:givenName Antonio Carlos
45 rdf:type schema:Person
46 N19505bc8b0204272a0da65e99446da20 rdf:first sg:person.012206665253.47
47 rdf:rest N6be66925201642deb3df77d8c7f17c9e
48 N3299a525292344d3bd436d852661cd3b schema:name dimensions_id
49 schema:value pub.1084682751
50 rdf:type schema:PropertyValue
51 N556d00523de74b8093537c45d4d44fb5 rdf:first N154fa96eee61421cbcf9ae23c6ba1b8f
52 rdf:rest Naaf839db7edd4c2a9999b662c7814047
53 N57b294af28444318bf36e14a852be8c5 schema:location Cham
54 schema:name Springer International Publishing
55 rdf:type schema:Organisation
56 N6be66925201642deb3df77d8c7f17c9e rdf:first sg:person.011411304653.19
57 rdf:rest Nd0a89dc571f64469884f9d5dc40ee206
58 N79414262be0f437eaf4f1d98a1decf5e rdf:first sg:person.01304017634.53
59 rdf:rest Na472d0d044c74b27b8c7a551e14c179b
60 N8af078d280c746b4b125efa68060e165 schema:familyName Wong
61 schema:givenName Stephan
62 rdf:type schema:Person
63 N93d3fc5194a146b184ca3311cde33f43 schema:name Springer Nature - SN SciGraph project
64 rdf:type schema:Organization
65 N94155de6223f469ea597b56370a80b68 schema:name Vital-IT SIB Swiss Institute of Bioinformatics Lausanne Switzerland
66 rdf:type schema:Organization
67 Na472d0d044c74b27b8c7a551e14c179b rdf:first sg:person.01013522507.49
68 rdf:rest Ne82c6f4db37d468795ab00069e36b55a
69 Naaf839db7edd4c2a9999b662c7814047 rdf:first N020c6d8067ee4e34859a6275db4bc997
70 rdf:rest Nc929d5ab80bf49be9dba1f8c437923af
71 Nb2bf4b0e4d88400285ba2cc95b25922c schema:name readcube_id
72 schema:value 24fccb0e6d661786f2f0cecb9d142ee1c25f689e381513d3862d711e87848ca0
73 rdf:type schema:PropertyValue
74 Nba6bbff58db848f0be20721d6a562219 schema:isbn 978-3-319-56257-5
75 978-3-319-56258-2
76 schema:name Applied Reconfigurable Computing
77 rdf:type schema:Book
78 Nbef9ff10dc634ecc99edbb53d65df9cc schema:name Vital-IT SIB Swiss Institute of Bioinformatics Lausanne Switzerland
79 rdf:type schema:Organization
80 Nc429d3578015468196caaadbfd051e91 rdf:first N8af078d280c746b4b125efa68060e165
81 rdf:rest N556d00523de74b8093537c45d4d44fb5
82 Nc5d605c6f4bc4da5b400f7b23bf6b4c0 schema:familyName Carro
83 schema:givenName Luigi
84 rdf:type schema:Person
85 Nc929d5ab80bf49be9dba1f8c437923af rdf:first Nc5d605c6f4bc4da5b400f7b23bf6b4c0
86 rdf:rest rdf:nil
87 Nd0a89dc571f64469884f9d5dc40ee206 rdf:first N002d0a4b8047444fb56649186345b120
88 rdf:rest N79414262be0f437eaf4f1d98a1decf5e
89 Ne82c6f4db37d468795ab00069e36b55a rdf:first sg:person.013201261375.17
90 rdf:rest rdf:nil
91 Nf0b1787b2afa4c988ff39526da1eac14 schema:name doi
92 schema:value 10.1007/978-3-319-56258-2_20
93 rdf:type schema:PropertyValue
94 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
95 schema:name Information and Computing Sciences
96 rdf:type schema:DefinedTerm
97 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
98 schema:name Artificial Intelligence and Image Processing
99 rdf:type schema:DefinedTerm
100 sg:person.01013522507.49 schema:affiliation N94155de6223f469ea597b56370a80b68
101 schema:familyName Iseli
102 schema:givenName Christian
103 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01013522507.49
104 rdf:type schema:Person
105 sg:person.011411304653.19 schema:affiliation https://www.grid.ac/institutes/grid.435142.5
106 schema:familyName Wertenbroek
107 schema:givenName Rick
108 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011411304653.19
109 rdf:type schema:Person
110 sg:person.012206665253.47 schema:affiliation https://www.grid.ac/institutes/grid.435142.5
111 schema:familyName Petraglio
112 schema:givenName Enrico
113 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012206665253.47
114 rdf:type schema:Person
115 sg:person.01304017634.53 schema:affiliation Nbef9ff10dc634ecc99edbb53d65df9cc
116 schema:familyName Guex
117 schema:givenName Nicolas
118 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01304017634.53
119 rdf:type schema:Person
120 sg:person.013201261375.17 schema:affiliation https://www.grid.ac/institutes/grid.435142.5
121 schema:familyName Thoma
122 schema:givenName Yann
123 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013201261375.17
124 rdf:type schema:Person
125 https://doi.org/10.1016/j.neunet.2009.08.007 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007737944
126 rdf:type schema:CreativeWork
127 https://doi.org/10.1016/j.patrec.2009.09.011 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019119600
128 rdf:type schema:CreativeWork
129 https://doi.org/10.1093/bioinformatics/btr014 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053036888
130 rdf:type schema:CreativeWork
131 https://doi.org/10.1093/bioinformatics/bts173 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029888576
132 rdf:type schema:CreativeWork
133 https://doi.org/10.1093/nar/gkr1124 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002413372
134 rdf:type schema:CreativeWork
135 https://doi.org/10.1101/gr.114819.110 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025065244
136 rdf:type schema:CreativeWork
137 https://doi.org/10.1109/ahs.2011.5963944 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095039885
138 rdf:type schema:CreativeWork
139 https://doi.org/10.1109/fpl.2013.6645501 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095145814
140 rdf:type schema:CreativeWork
141 https://doi.org/10.1371/journal.pbio.1002195 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011952988
142 rdf:type schema:CreativeWork
143 https://www.grid.ac/institutes/grid.435142.5 schema:alternateName School of Management and Engineering Vaud
144 schema:name REDS Institute, HEIG-VD School of Business and Engineering Vaud HES-SO University of Applied Sciences Western Switzerland Yverdon-les-Bains Switzerland
145 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...