Ontology type: schema:ScholarlyArticle Open Access: True
2012-12-05
AUTHORSSteven Lewis, Attila Csordas, Sarah Killcoyne, Henning Hermjakob, Michael R Hoopmann, Robert L Moritz, Eric W Deutsch, John Boyle
ABSTRACTBackgroundFor shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed.ResultsWe present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed.ConclusionThe software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. More... »
PAGES324
http://scigraph.springernature.com/pub.10.1186/1471-2105-13-324
DOIhttp://dx.doi.org/10.1186/1471-2105-13-324
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1032496631
PUBMEDhttps://www.ncbi.nlm.nih.gov/pubmed/23216909
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information Systems",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Algorithms",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Databases, Factual",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Mass Spectrometry",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Peptides",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Protein Processing, Post-Translational",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Proteomics",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Search Engine",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Sequence Analysis, Protein",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Software",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Institute for Systems Biology, Seattle, WA, USA",
"id": "http://www.grid.ac/institutes/grid.64212.33",
"name": [
"Institute for Systems Biology, Seattle, WA, USA"
],
"type": "Organization"
},
"familyName": "Lewis",
"givenName": "Steven",
"id": "sg:person.01174342510.74",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01174342510.74"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "PRIDE Group Proteomics Services Team EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK",
"id": "http://www.grid.ac/institutes/None",
"name": [
"PRIDE Group Proteomics Services Team EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK"
],
"type": "Organization"
},
"familyName": "Csordas",
"givenName": "Attila",
"id": "sg:person.0630623044.12",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0630623044.12"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg, Germany",
"id": "http://www.grid.ac/institutes/grid.16008.3f",
"name": [
"Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg, Germany"
],
"type": "Organization"
},
"familyName": "Killcoyne",
"givenName": "Sarah",
"id": "sg:person.01274460411.86",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01274460411.86"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "PRIDE Group Proteomics Services Team EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK",
"id": "http://www.grid.ac/institutes/None",
"name": [
"PRIDE Group Proteomics Services Team EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK"
],
"type": "Organization"
},
"familyName": "Hermjakob",
"givenName": "Henning",
"id": "sg:person.01070655672.90",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01070655672.90"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Institute for Systems Biology, Seattle, WA, USA",
"id": "http://www.grid.ac/institutes/grid.64212.33",
"name": [
"Institute for Systems Biology, Seattle, WA, USA"
],
"type": "Organization"
},
"familyName": "Hoopmann",
"givenName": "Michael R",
"id": "sg:person.01273652030.51",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01273652030.51"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Institute for Systems Biology, Seattle, WA, USA",
"id": "http://www.grid.ac/institutes/grid.64212.33",
"name": [
"Institute for Systems Biology, Seattle, WA, USA"
],
"type": "Organization"
},
"familyName": "Moritz",
"givenName": "Robert L",
"id": "sg:person.0727763527.91",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0727763527.91"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Institute for Systems Biology, Seattle, WA, USA",
"id": "http://www.grid.ac/institutes/grid.64212.33",
"name": [
"Institute for Systems Biology, Seattle, WA, USA"
],
"type": "Organization"
},
"familyName": "Deutsch",
"givenName": "Eric W",
"id": "sg:person.01031111573.40",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01031111573.40"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Institute for Systems Biology, Seattle, WA, USA",
"id": "http://www.grid.ac/institutes/grid.64212.33",
"name": [
"Institute for Systems Biology, Seattle, WA, USA"
],
"type": "Organization"
},
"familyName": "Boyle",
"givenName": "John",
"id": "sg:person.01110033460.10",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01110033460.10"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1016/1044-0305(94)80016-2",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1018629105",
"https://doi.org/10.1016/1044-0305(94)80016-2"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/nbt.2112",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1051476522",
"https://doi.org/10.1038/nbt.2112"
],
"type": "CreativeWork"
}
],
"datePublished": "2012-12-05",
"datePublishedReg": "2012-12-05",
"description": "BackgroundFor shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed.ResultsWe present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed.ConclusionThe software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.",
"genre": "article",
"id": "sg:pub.10.1186/1471-2105-13-324",
"inLanguage": "en",
"isAccessibleForFree": true,
"isFundedItemOf": [
{
"id": "sg:grant.3104186",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.2480242",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.2669381",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.2440532",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.3784916",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.2520375",
"type": "MonetaryGrant"
}
],
"isPartOf": [
{
"id": "sg:journal.1023786",
"issn": [
"1471-2105"
],
"name": "BMC Bioinformatics",
"publisher": "Springer Nature",
"type": "Periodical"
},
{
"issueNumber": "1",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "13"
}
],
"keywords": [
"search engines",
"same input files",
"sequence database search engines",
"number of processors",
"proteomics search engine",
"Hadoop MapReduce",
"database search engines",
"large database",
"original implementation",
"input files",
"expensive step",
"available resources",
"engine",
"comparable outputs",
"MapReduce",
"Hadoop",
"framework",
"scalability",
"large number",
"database",
"processors",
"architecture",
"algorithm",
"software",
"files",
"throughput",
"peptide database",
"implementation",
"processing",
"resources",
"search",
"performance",
"system",
"number",
"solution",
"output",
"scope",
"clusters",
"step",
"data",
"ability",
"numerous modifications",
"sequence",
"development",
"modification",
"rate",
"proteomics",
"spectra",
"shotgun mass spectrometry",
"ResultsWe",
"high rate",
"post-translational modifications",
"mass",
"spectrometer",
"mass spectrometer",
"mass spectrometry",
"spectrometry"
],
"name": "Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework",
"pagination": "324",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1032496631"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1186/1471-2105-13-324"
]
},
{
"name": "pubmed_id",
"type": "PropertyValue",
"value": [
"23216909"
]
}
],
"sameAs": [
"https://doi.org/10.1186/1471-2105-13-324",
"https://app.dimensions.ai/details/publication/pub.1032496631"
],
"sdDataset": "articles",
"sdDatePublished": "2022-06-01T22:09",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220601/entities/gbq_results/article/article_575.jsonl",
"type": "ScholarlyArticle",
"url": "https://doi.org/10.1186/1471-2105-13-324"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-324'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-324'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-324'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-324'
This table displays all metadata directly associated to this object as RDF triples.
229 TRIPLES
22 PREDICATES
94 URIs
84 LITERALS
16 BLANK NODES