Ontology type: schema:ScholarlyArticle Open Access: True
2018-02-19
AUTHORSMaryam Etemadi, Mehri Bagherian, Zhi-Zhong Chen, Lusheng Wang
ABSTRACTBackgroundThe haplotype assembly problem for diploid is to find a pair of haplotypes from a given set of aligned Single Nucleotide Polymorphism (SNP) fragments (reads). It has many applications in association studies, drug design, and genetic research. Since this problem is computationally hard, both heuristic and exact algorithms have been designed for it. Although exact algorithms are much slower, they are still of great interest because they usually output significantly better solutions than heuristic algorithms in terms of popular measures such as the Minimum Error Correction (MEC) score, the number of switch errors, and the QAN50 score. Exact algorithms are also valuable because they can be used to witness how good a heuristic algorithm is. The best known exact algorithm is based on integer linear programming (ILP) and it is known that ILP can also be used to improve the output quality of every heuristic algorithm with a little decline in speed. Therefore, faster ILP models for the problem are highly demanded.ResultsAs in previous studies, we consider not only the general case of the problem but also its all-heterozygous case where we assume that if a column of the input read matrix contains at least one 0 and one 1, then it corresponds to a heterozygous SNP site. For both cases, we design new ILP models for the haplotype assembly problem which aim at minimizing the MEC score. The new models are theoretically better because they contain significantly fewer constraints. More importantly, our experimental results show that for both simulated and real datasets, the new model for the all-heterozygous (respectively, general) case can usually be solved via CPLEX (an ILP solver) at least 5 times (respectively, twice) faster than the previous bests. Indeed, the running time can sometimes be 41 times better.ConclusionsThis paper proposes a new ILP model for the haplotype assembly problem and its all-heterozygous case, respectively. Experiments with both real and simulated datasets show that the new models can be solved within much shorter time by CPLEX than the previous bests. We believe that the models can be used to improve heuristic algorithms as well. More... »
PAGES52
http://scigraph.springernature.com/pub.10.1186/s12859-018-2012-x
DOIhttp://dx.doi.org/10.1186/s12859-018-2012-x
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1101116794
PUBMEDhttps://www.ncbi.nlm.nih.gov/pubmed/29504891
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Computation Theory and Mathematics",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Algorithms",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Haplotypes",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Heterozygote",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Humans",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Models, Genetic",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Polymorphism, Single Nucleotide",
"type": "DefinedTerm"
},
{
"inDefinedTermSet": "https://www.nlm.nih.gov/mesh/",
"name": "Programming, Linear",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Department of Applied Mathematics, Faculty of Mathematical Sciences, University of Guilan, 41938-33697, Rasht, Iran",
"id": "http://www.grid.ac/institutes/grid.411872.9",
"name": [
"Department of Applied Mathematics, Faculty of Mathematical Sciences, University of Guilan, 41938-33697, Rasht, Iran"
],
"type": "Organization"
},
"familyName": "Etemadi",
"givenName": "Maryam",
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Applied Mathematics, Faculty of Mathematical Sciences, University of Guilan, 41938-33697, Rasht, Iran",
"id": "http://www.grid.ac/institutes/grid.411872.9",
"name": [
"Department of Applied Mathematics, Faculty of Mathematical Sciences, University of Guilan, 41938-33697, Rasht, Iran"
],
"type": "Organization"
},
"familyName": "Bagherian",
"givenName": "Mehri",
"id": "sg:person.010227623665.36",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010227623665.36"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Division of Information System Design, Tokyo Denki University, 350-0394, Saitama, Japan",
"id": "http://www.grid.ac/institutes/grid.412773.4",
"name": [
"Division of Information System Design, Tokyo Denki University, 350-0394, Saitama, Japan"
],
"type": "Organization"
},
"familyName": "Chen",
"givenName": "Zhi-Zhong",
"id": "sg:person.015654265661.98",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015654265661.98"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "City University of Hong Kong Shenzhen Research Institute, ShenzhenHi-TechIndustrialPark, Nanshan District, Shenzhen, People\u2019s Republic of China",
"id": "http://www.grid.ac/institutes/grid.464255.4",
"name": [
"Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong",
"City University of Hong Kong Shenzhen Research Institute, ShenzhenHi-TechIndustrialPark, Nanshan District, Shenzhen, People\u2019s Republic of China"
],
"type": "Organization"
},
"familyName": "Wang",
"givenName": "Lusheng",
"id": "sg:person.01105113721.52",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01105113721.52"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1007/11557067_11",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1022773764",
"https://doi.org/10.1007/11557067_11"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/3-540-44676-1_15",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1028578523",
"https://doi.org/10.1007/3-540-44676-1_15"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s12864-015-1408-5",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1005621335",
"https://doi.org/10.1186/s12864-015-1408-5"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/ng.2673",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1038401461",
"https://doi.org/10.1038/ng.2673"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/gb-2009-10-4-r42",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1024097093",
"https://doi.org/10.1186/gb-2009-10-4-r42"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-540-30219-3_23",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1030350549",
"https://doi.org/10.1007/978-3-540-30219-3_23"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/nbt.3200",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1003980483",
"https://doi.org/10.1038/nbt.3200"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/10290",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1008973203",
"https://doi.org/10.1038/10290"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s13258-015-0342-x",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1008043767",
"https://doi.org/10.1007/s13258-015-0342-x"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/10297",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1011881979",
"https://doi.org/10.1038/10297"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s12859-015-0651-8",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1027581361",
"https://doi.org/10.1186/s12859-015-0651-8"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-19929-0_9",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1041097376",
"https://doi.org/10.1007/978-3-319-19929-0_9"
],
"type": "CreativeWork"
}
],
"datePublished": "2018-02-19",
"datePublishedReg": "2018-02-19",
"description": "BackgroundThe haplotype assembly problem for diploid is to find a pair of haplotypes from a given set of aligned Single Nucleotide Polymorphism (SNP) fragments (reads). It has many applications in association studies, drug design, and genetic research. Since this problem is computationally hard, both heuristic and exact algorithms have been designed for it. Although exact algorithms are much slower, they are still of great interest because they usually output significantly better solutions than heuristic algorithms in terms of popular measures such as the Minimum Error Correction (MEC) score, the number of switch errors, and the QAN50 score. Exact algorithms are also valuable because they can be used to witness how good a heuristic algorithm is. The best known exact algorithm is based on integer linear programming (ILP) and it is known that ILP can also be used to improve the output quality of every heuristic algorithm with a little decline in speed. Therefore, faster ILP models for the problem are highly demanded.ResultsAs in previous studies, we consider not only the general case of the problem but also its all-heterozygous case where we assume that if a column of the input read matrix contains at least one 0 and one 1, then it corresponds to a heterozygous SNP site. For both cases, we design new ILP models for the haplotype assembly problem which aim at minimizing the MEC score. The new models are theoretically better because they contain significantly fewer constraints. More importantly, our experimental results show that for both simulated and real datasets, the new model for the all-heterozygous (respectively, general) case can usually be solved via CPLEX (an ILP solver) at least 5 times (respectively, twice) faster than the previous bests. Indeed, the running time can sometimes be 41 times better.ConclusionsThis paper proposes a new ILP model for the haplotype assembly problem and its all-heterozygous case, respectively. Experiments with both real and simulated datasets show that the new models can be solved within much shorter time by CPLEX than the previous bests. We believe that the models can be used to improve heuristic algorithms as well.",
"genre": "article",
"id": "sg:pub.10.1186/s12859-018-2012-x",
"inLanguage": "en",
"isAccessibleForFree": true,
"isFundedItemOf": [
{
"id": "sg:grant.8293592",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.7426977",
"type": "MonetaryGrant"
}
],
"isPartOf": [
{
"id": "sg:journal.1023786",
"issn": [
"1471-2105"
],
"name": "BMC Bioinformatics",
"publisher": "Springer Nature",
"type": "Periodical"
},
{
"issueNumber": "Suppl 1",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "19"
}
],
"keywords": [
"haplotype assembly problem",
"integer linear programming",
"heuristic algorithm",
"ILP model",
"new ILP model",
"exact algorithm",
"previous bests",
"assembly problem",
"single nucleotide polymorphism (SNP) fragments",
"real datasets",
"running time",
"pair of haplotypes",
"best solution",
"algorithm",
"output quality",
"haplotype assembly",
"simulated dataset",
"linear programming",
"new model",
"experimental results",
"dataset",
"CPLEX",
"switch errors",
"programming",
"general case",
"model",
"popular measure",
"short time",
"constraints",
"set",
"applications",
"input",
"time",
"error",
"BEST",
"MEC score",
"design",
"speed",
"great interest",
"solution",
"quality",
"drug design",
"experiments",
"research",
"interest",
"terms",
"number",
"cases",
"results",
"pairs",
"matrix",
"ConclusionsThis paper",
"measures",
"correction scores",
"SNP sites",
"scores",
"heterozygous cases",
"previous studies",
"study",
"assembly",
"little decline",
"genetic research",
"association studies",
"sites",
"fragments",
"column",
"haplotypes",
"problem",
"decline",
"paper",
"diploid",
"polymorphism fragments"
],
"name": "Better ILP models for haplotype assembly",
"pagination": "52",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1101116794"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1186/s12859-018-2012-x"
]
},
{
"name": "pubmed_id",
"type": "PropertyValue",
"value": [
"29504891"
]
}
],
"sameAs": [
"https://doi.org/10.1186/s12859-018-2012-x",
"https://app.dimensions.ai/details/publication/pub.1101116794"
],
"sdDataset": "articles",
"sdDatePublished": "2022-05-20T07:34",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/article/article_769.jsonl",
"type": "ScholarlyArticle",
"url": "https://doi.org/10.1186/s12859-018-2012-x"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12859-018-2012-x'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12859-018-2012-x'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12859-018-2012-x'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12859-018-2012-x'
This table displays all metadata directly associated to this object as RDF triples.
240 TRIPLES
22 PREDICATES
117 URIs
97 LITERALS
14 BLANK NODES