Ontology type: schema:ScholarlyArticle Open Access: True
2019-12
AUTHORSLavinia Egidi, Felipe A. Louza, Giovanni Manzini, Guilherme P. Telles
ABSTRACTBackground: Sequencing technologies produce larger and larger collections of biosequences that have to be stored in compressed indices supporting fast search operations. Many compressed indices are based on the Burrows-Wheeler Transform (BWT) and the longest common prefix (LCP) array. Because of the sheer size of the input it is important to build these data structures in external memory and time using in the best possible way the available RAM. Results: We propose a space-efficient algorithm to compute the BWT and LCP array for a collection of sequences in the external or semi-external memory setting. Our algorithm splits the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external or semi-external memory and in the process it also computes the LCP values. Our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP array, provide simple, scan-based, external memory algorithms for three well known problems in bioinformatics: the computation of maximal repeats, the all pairs suffix-prefix overlaps, and the construction of succinct de Bruijn graphs. Conclusions: We prove that our algorithm performs O ( n maxlcp ) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm is only slightly slower than the state of the art for short sequences but it is up to 40 times faster for longer sequences or when the available RAM is at least equal to the size of the input. More... »
PAGES6
http://scigraph.springernature.com/pub.10.1186/s13015-019-0140-0
DOIhttp://dx.doi.org/10.1186/s13015-019-0140-0
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1112645996
PUBMEDhttps://www.ncbi.nlm.nih.gov/pubmed/30899322
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Computation Theory and Mathematics",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "University of Eastern Piedmont Amadeo Avogadro",
"id": "https://www.grid.ac/institutes/grid.16563.37",
"name": [
"DiSIT, University of Eastern Piedmont, Viale Michel, 11, 15121, Alessandria, Italy"
],
"type": "Organization"
},
"familyName": "Egidi",
"givenName": "Lavinia",
"type": "Person"
},
{
"affiliation": {
"alternateName": "University of Sao Paulo",
"id": "https://www.grid.ac/institutes/grid.11899.38",
"name": [
"Department of Computing and Mathematics, University of S\u00e3o Paulo, Av. Bandeirantes, 3900, 14040-901, Ribeir\u00e3o Preto, Brazil"
],
"type": "Organization"
},
"familyName": "Louza",
"givenName": "Felipe A.",
"type": "Person"
},
{
"affiliation": {
"alternateName": "Institute of Informatics and Telematics",
"id": "https://www.grid.ac/institutes/grid.473659.a",
"name": [
"DiSIT, University of Eastern Piedmont, Viale Michel, 11, 15121, Alessandria, Italy",
"IIT CNR, Via Moruzzi, 1, 56124, Pisa, Italy"
],
"type": "Organization"
},
"familyName": "Manzini",
"givenName": "Giovanni",
"type": "Person"
},
{
"affiliation": {
"alternateName": "State University of Campinas",
"id": "https://www.grid.ac/institutes/grid.411087.b",
"name": [
"Institute of Computing, University of Campinas, Av. Albert Einstein, 1251, 13083-852, Campinas, Brazil"
],
"type": "Organization"
},
"familyName": "Telles",
"givenName": "Guilherme P.",
"type": "Person"
}
],
"citation": [
{
"id": "https://doi.org/10.1002/spe.844",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1001233322"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/2444016.2461327",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1012935447"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/1216370.1216372",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1013726603"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-540-27810-8_32",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1018610105",
"https://doi.org/10.1007/978-3-540-27810-8_32"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-540-27810-8_32",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1018610105",
"https://doi.org/10.1007/978-3-540-27810-8_32"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-642-16321-0_36",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1020531886",
"https://doi.org/10.1007/978-3-642-16321-0_36"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-642-16321-0_36",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1020531886",
"https://doi.org/10.1007/978-3-642-16321-0_36"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-02432-5_5",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1022178748",
"https://doi.org/10.1007/978-3-319-02432-5_5"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.tcs.2012.02.002",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1022562794"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.jda.2016.03.003",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1025682963"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-642-33122-0_18",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1027417410",
"https://doi.org/10.1007/978-3-642-33122-0_18"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/2591796.2591885",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1029999221"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.ipl.2009.10.015",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1032471155"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-662-49529-2_13",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1033118390",
"https://doi.org/10.1007/978-3-662-49529-2_13"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s00453-016-0165-4",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1033175832",
"https://doi.org/10.1007/s00453-016-0165-4"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/2493175.2493180",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1033671098"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/2851491",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1035474229"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/384192.384193",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1040182656"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/3-540-45749-6_61",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1042900552",
"https://doi.org/10.1007/3-540-45749-6_61"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-642-12200-2_60",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1043252920",
"https://doi.org/10.1007/978-3-642-12200-2_60"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-642-12200-2_60",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1043252920",
"https://doi.org/10.1007/978-3-642-12200-2_60"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.tcs.2007.07.014",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1043471141"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-662-44753-6_23",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1046567866",
"https://doi.org/10.1007/978-3-662-44753-6_23"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/2649387.2649431",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1046847037"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1093/bioinformatics/btu584",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1049597276"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.jda.2016.04.002",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1049848652"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/3-540-44888-8_5",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1051255923",
"https://doi.org/10.1007/3-540-44888-8_5"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/0020-0190(92)90176-v",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1052344476"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/tcbb.2011.127",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061540869"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1137/0222058",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1062842461"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s11786-016-0281-1",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1083757671",
"https://doi.org/10.1007/s11786-016-0281-1"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s11786-016-0281-1",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1083757671",
"https://doi.org/10.1007/s11786-016-0281-1"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1093/bioinformatics/btx067",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1083840416"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.tcs.2017.03.039",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1084535347"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-67428-5_15",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1091476846",
"https://doi.org/10.1007/978-3-319-67428-5_15"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-67428-5_15",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1091476846",
"https://doi.org/10.1007/978-3-319-67428-5_15"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/dcc.2015.70",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1094119687"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1137/1.9781611974782.26",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1098555285"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1017/cbo9781139940023",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1098672764"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1017/cbo9780511574931",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1098674485"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s13015-017-0117-9",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1099695383",
"https://doi.org/10.1186/s13015-017-0117-9"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1371/journal.pcbi.1005944",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1100667150"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-030-00479-8_13",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1106966951",
"https://doi.org/10.1007/978-3-030-00479-8_13"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-030-00479-8_13",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1106966951",
"https://doi.org/10.1007/978-3-030-00479-8_13"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-030-00479-8_23",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1106966962",
"https://doi.org/10.1007/978-3-030-00479-8_23"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-030-00479-8_23",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1106966962",
"https://doi.org/10.1007/978-3-030-00479-8_23"
],
"type": "CreativeWork"
}
],
"datePublished": "2019-12",
"datePublishedReg": "2019-12-01",
"description": "Background: Sequencing technologies produce larger and larger collections of biosequences that have to be stored in compressed indices supporting fast search operations. Many compressed indices are based on the Burrows-Wheeler Transform (BWT) and the longest common prefix (LCP) array. Because of the sheer size of the input it is important to build these data structures in external memory and time using in the best possible way the available RAM.\nResults: We propose a space-efficient algorithm to compute the BWT and LCP array for a collection of sequences in the external or semi-external memory setting. Our algorithm splits the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external or semi-external memory and in the process it also computes the LCP values. Our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP array, provide simple, scan-based, external memory algorithms for three well known problems in bioinformatics: the computation of maximal repeats, the all pairs suffix-prefix overlaps, and the construction of succinct de Bruijn graphs.\nConclusions: We prove that our algorithm performs O ( n maxlcp ) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm is only slightly slower than the state of the art for short sequences but it is up to 40 times faster for longer sequences or when the available RAM is at least equal to the size of the input.",
"genre": "research_article",
"id": "sg:pub.10.1186/s13015-019-0140-0",
"inLanguage": [
"en"
],
"isAccessibleForFree": true,
"isFundedItemOf": [
{
"id": "sg:grant.6852716",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.4557966",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.4491249",
"type": "MonetaryGrant"
}
],
"isPartOf": [
{
"id": "sg:journal.1036449",
"issn": [
"1748-7188"
],
"name": "Algorithms for Molecular Biology",
"type": "Periodical"
},
{
"issueNumber": "1",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "14"
}
],
"name": "External memory BWT and LCP computation for sequence collections with applications",
"pagination": "6",
"productId": [
{
"name": "readcube_id",
"type": "PropertyValue",
"value": [
"b95092b3a0e74be7932a9bf8040bb719d524c54f2045912ec7dc489b4807b872"
]
},
{
"name": "pubmed_id",
"type": "PropertyValue",
"value": [
"30899322"
]
},
{
"name": "nlm_unique_id",
"type": "PropertyValue",
"value": [
"101265088"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1186/s13015-019-0140-0"
]
},
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1112645996"
]
}
],
"sameAs": [
"https://doi.org/10.1186/s13015-019-0140-0",
"https://app.dimensions.ai/details/publication/pub.1112645996"
],
"sdDataset": "articles",
"sdDatePublished": "2019-04-11T13:17",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000368_0000000368/records_78934_00000001.jsonl",
"type": "ScholarlyArticle",
"url": "https://link.springer.com/10.1186%2Fs13015-019-0140-0"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13015-019-0140-0'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13015-019-0140-0'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13015-019-0140-0'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13015-019-0140-0'
This table displays all metadata directly associated to this object as RDF triples.
233 TRIPLES
21 PREDICATES
68 URIs
21 LITERALS
9 BLANK NODES