Efficient algorithms for polyploid haplotype phasing View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-05-09

AUTHORS

Dan He, Subrata Saha, Richard Finkers, Laxmi Parida

ABSTRACT

BackgroundInference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more.ResultsWe proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes.ConclusionsOur experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective. More... »

PAGES

110

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s12864-018-4464-9

DOI

http://dx.doi.org/10.1186/s12864-018-4464-9

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1103910461

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/29764364


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Haplotypes", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Polyploidy", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "College of Computer Science and Software, Shenzhen University, 518060, Shenzhen, China", 
          "id": "http://www.grid.ac/institutes/grid.263488.3", 
          "name": [
            "College of Computer Science and Software, Shenzhen University, 518060, Shenzhen, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "He", 
        "givenName": "Dan", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM T.J. Watson Research Center, 1101 Kitchawan Rd, 10598, Yorktown Heights, NY, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM T.J. Watson Research Center, 1101 Kitchawan Rd, 10598, Yorktown Heights, NY, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Saha", 
        "givenName": "Subrata", 
        "id": "sg:person.01300161230.10", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01300161230.10"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Wageningen University & Research, 6708 PB, Wageningen, Netherlands", 
          "id": "http://www.grid.ac/institutes/grid.4818.5", 
          "name": [
            "Wageningen University & Research, 6708 PB, Wageningen, Netherlands"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Finkers", 
        "givenName": "Richard", 
        "id": "sg:person.01157055063.55", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01157055063.55"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM T.J. Watson Research Center, 1101 Kitchawan Rd, 10598, Yorktown Heights, NY, USA", 
          "id": "http://www.grid.ac/institutes/grid.481554.9", 
          "name": [
            "IBM T.J. Watson Research Center, 1101 Kitchawan Rd, 10598, Yorktown Heights, NY, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Parida", 
        "givenName": "Laxmi", 
        "id": "sg:person.01336557015.68", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/3-540-44676-1_15", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028578523", 
          "https://doi.org/10.1007/3-540-44676-1_15"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-16-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010608153", 
          "https://doi.org/10.1186/1471-2164-16-1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng2088", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046979341", 
          "https://doi.org/10.1038/ng2088"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-14-s2-s2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027487400", 
          "https://doi.org/10.1186/1471-2164-14-s2-s2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1752-0509-6-s2-s8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029324021", 
          "https://doi.org/10.1186/1752-0509-6-s2-s8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng.548", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016055940", 
          "https://doi.org/10.1038/ng.548"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12864-015-1408-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005621335", 
          "https://doi.org/10.1186/s12864-015-1408-5"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-05-09", 
    "datePublishedReg": "2018-05-09", 
    "description": "BackgroundInference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more.ResultsWe proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes.ConclusionsOur experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/s12864-018-4464-9", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023790", 
        "issn": [
          "1471-2164"
        ], 
        "name": "BMC Genomics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "Suppl 2", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "19"
      }
    ], 
    "keywords": [
      "efficient algorithm", 
      "haplotype phasing", 
      "search space", 
      "art methods", 
      "first algorithm", 
      "algorithm", 
      "fundamental problem", 
      "polyploid haplotypes", 
      "Gibbs sampling", 
      "sequencing reads", 
      "same time", 
      "multiple individuals", 
      "key component", 
      "concatenation", 
      "reads", 
      "information", 
      "imputation", 
      "experiments", 
      "method", 
      "mapping", 
      "space", 
      "assignment", 
      "work", 
      "block", 
      "quality", 
      "knowledge", 
      "data", 
      "descent", 
      "ConclusionsOur experiments", 
      "attention", 
      "time", 
      "read assignment", 
      "sequence", 
      "components", 
      "state", 
      "mismatch", 
      "sampling", 
      "analysis", 
      "identity", 
      "phasing", 
      "sequence of alleles", 
      "same chromosome", 
      "regions of identity", 
      "admixture mapping", 
      "haplotype blocks", 
      "individuals", 
      "haplotypes", 
      "region", 
      "chromosomes", 
      "ResultsWe", 
      "alleles", 
      "genetics", 
      "problem"
    ], 
    "name": "Efficient algorithms for polyploid haplotype phasing", 
    "pagination": "110", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1103910461"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s12864-018-4464-9"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "29764364"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s12864-018-4464-9", 
      "https://app.dimensions.ai/details/publication/pub.1103910461"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-09-02T16:02", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220902/entities/gbq_results/article/article_775.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/s12864-018-4464-9"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-4464-9'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-4464-9'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-4464-9'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12864-018-4464-9'


 

This table displays all metadata directly associated to this object as RDF triples.

191 TRIPLES      21 PREDICATES      91 URIs      76 LITERALS      13 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s12864-018-4464-9 schema:about N208b273ef0ea4f809b9bd3d1dfd443a5
2 N3b16deec4261410fa54877a0f91aabf2
3 N6342c44055e449c08075ec0230496115
4 N878dfdcdd2094424b1423d9a49369b7d
5 N8a81f9707e5a4e0b9552d7d9bc102f16
6 Nc07a4823ed4341e08c5654cf9f237d18
7 anzsrc-for:06
8 anzsrc-for:0604
9 schema:author N51f3745f544048b0b0cb429b81fb239b
10 schema:citation sg:pub.10.1007/3-540-44676-1_15
11 sg:pub.10.1038/ng.548
12 sg:pub.10.1038/ng2088
13 sg:pub.10.1186/1471-2164-14-s2-s2
14 sg:pub.10.1186/1471-2164-16-1
15 sg:pub.10.1186/1752-0509-6-s2-s8
16 sg:pub.10.1186/s12864-015-1408-5
17 schema:datePublished 2018-05-09
18 schema:datePublishedReg 2018-05-09
19 schema:description BackgroundInference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more.ResultsWe proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes.ConclusionsOur experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.
20 schema:genre article
21 schema:isAccessibleForFree true
22 schema:isPartOf Naefeeadfa0f742fbab9655e79f5056b1
23 Neb86bc6910c646c2a9f332d870712011
24 sg:journal.1023790
25 schema:keywords ConclusionsOur experiments
26 Gibbs sampling
27 ResultsWe
28 admixture mapping
29 algorithm
30 alleles
31 analysis
32 art methods
33 assignment
34 attention
35 block
36 chromosomes
37 components
38 concatenation
39 data
40 descent
41 efficient algorithm
42 experiments
43 first algorithm
44 fundamental problem
45 genetics
46 haplotype blocks
47 haplotype phasing
48 haplotypes
49 identity
50 imputation
51 individuals
52 information
53 key component
54 knowledge
55 mapping
56 method
57 mismatch
58 multiple individuals
59 phasing
60 polyploid haplotypes
61 problem
62 quality
63 read assignment
64 reads
65 region
66 regions of identity
67 same chromosome
68 same time
69 sampling
70 search space
71 sequence
72 sequence of alleles
73 sequencing reads
74 space
75 state
76 time
77 work
78 schema:name Efficient algorithms for polyploid haplotype phasing
79 schema:pagination 110
80 schema:productId N33d25f8a1ef0444cbc4d056757038127
81 N5a1e3778fef1490983781002a51014a9
82 N94ccce6ae3644063a2473f638009a48e
83 schema:sameAs https://app.dimensions.ai/details/publication/pub.1103910461
84 https://doi.org/10.1186/s12864-018-4464-9
85 schema:sdDatePublished 2022-09-02T16:02
86 schema:sdLicense https://scigraph.springernature.com/explorer/license/
87 schema:sdPublisher N948950132911465aab508307e8410fe6
88 schema:url https://doi.org/10.1186/s12864-018-4464-9
89 sgo:license sg:explorer/license/
90 sgo:sdDataset articles
91 rdf:type schema:ScholarlyArticle
92 N208b273ef0ea4f809b9bd3d1dfd443a5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
93 schema:name Haplotypes
94 rdf:type schema:DefinedTerm
95 N33d25f8a1ef0444cbc4d056757038127 schema:name dimensions_id
96 schema:value pub.1103910461
97 rdf:type schema:PropertyValue
98 N3b16deec4261410fa54877a0f91aabf2 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
99 schema:name Sequence Analysis, DNA
100 rdf:type schema:DefinedTerm
101 N470b7ee4c9454f3da454f860ccfd876c schema:affiliation grid-institutes:grid.263488.3
102 schema:familyName He
103 schema:givenName Dan
104 rdf:type schema:Person
105 N51f3745f544048b0b0cb429b81fb239b rdf:first N470b7ee4c9454f3da454f860ccfd876c
106 rdf:rest Nc3d60d3e3b0c4ffea4d6b17216799e71
107 N5a1e3778fef1490983781002a51014a9 schema:name pubmed_id
108 schema:value 29764364
109 rdf:type schema:PropertyValue
110 N6342c44055e449c08075ec0230496115 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
111 schema:name Genomics
112 rdf:type schema:DefinedTerm
113 N878dfdcdd2094424b1423d9a49369b7d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
114 schema:name Polyploidy
115 rdf:type schema:DefinedTerm
116 N8a81f9707e5a4e0b9552d7d9bc102f16 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
117 schema:name Algorithms
118 rdf:type schema:DefinedTerm
119 N948950132911465aab508307e8410fe6 schema:name Springer Nature - SN SciGraph project
120 rdf:type schema:Organization
121 N94ccce6ae3644063a2473f638009a48e schema:name doi
122 schema:value 10.1186/s12864-018-4464-9
123 rdf:type schema:PropertyValue
124 Naefeeadfa0f742fbab9655e79f5056b1 schema:issueNumber Suppl 2
125 rdf:type schema:PublicationIssue
126 Nb307a28c7b3f4ac2952c6e3fc6c2bb83 rdf:first sg:person.01336557015.68
127 rdf:rest rdf:nil
128 Nc07a4823ed4341e08c5654cf9f237d18 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
129 schema:name Genome
130 rdf:type schema:DefinedTerm
131 Nc3d60d3e3b0c4ffea4d6b17216799e71 rdf:first sg:person.01300161230.10
132 rdf:rest Nf9eceb27d2d94b9f8b0998890269a70a
133 Neb86bc6910c646c2a9f332d870712011 schema:volumeNumber 19
134 rdf:type schema:PublicationVolume
135 Nf9eceb27d2d94b9f8b0998890269a70a rdf:first sg:person.01157055063.55
136 rdf:rest Nb307a28c7b3f4ac2952c6e3fc6c2bb83
137 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
138 schema:name Biological Sciences
139 rdf:type schema:DefinedTerm
140 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
141 schema:name Genetics
142 rdf:type schema:DefinedTerm
143 sg:journal.1023790 schema:issn 1471-2164
144 schema:name BMC Genomics
145 schema:publisher Springer Nature
146 rdf:type schema:Periodical
147 sg:person.01157055063.55 schema:affiliation grid-institutes:grid.4818.5
148 schema:familyName Finkers
149 schema:givenName Richard
150 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01157055063.55
151 rdf:type schema:Person
152 sg:person.01300161230.10 schema:affiliation grid-institutes:grid.481554.9
153 schema:familyName Saha
154 schema:givenName Subrata
155 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01300161230.10
156 rdf:type schema:Person
157 sg:person.01336557015.68 schema:affiliation grid-institutes:grid.481554.9
158 schema:familyName Parida
159 schema:givenName Laxmi
160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01336557015.68
161 rdf:type schema:Person
162 sg:pub.10.1007/3-540-44676-1_15 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028578523
163 https://doi.org/10.1007/3-540-44676-1_15
164 rdf:type schema:CreativeWork
165 sg:pub.10.1038/ng.548 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016055940
166 https://doi.org/10.1038/ng.548
167 rdf:type schema:CreativeWork
168 sg:pub.10.1038/ng2088 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046979341
169 https://doi.org/10.1038/ng2088
170 rdf:type schema:CreativeWork
171 sg:pub.10.1186/1471-2164-14-s2-s2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027487400
172 https://doi.org/10.1186/1471-2164-14-s2-s2
173 rdf:type schema:CreativeWork
174 sg:pub.10.1186/1471-2164-16-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010608153
175 https://doi.org/10.1186/1471-2164-16-1
176 rdf:type schema:CreativeWork
177 sg:pub.10.1186/1752-0509-6-s2-s8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029324021
178 https://doi.org/10.1186/1752-0509-6-s2-s8
179 rdf:type schema:CreativeWork
180 sg:pub.10.1186/s12864-015-1408-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005621335
181 https://doi.org/10.1186/s12864-015-1408-5
182 rdf:type schema:CreativeWork
183 grid-institutes:grid.263488.3 schema:alternateName College of Computer Science and Software, Shenzhen University, 518060, Shenzhen, China
184 schema:name College of Computer Science and Software, Shenzhen University, 518060, Shenzhen, China
185 rdf:type schema:Organization
186 grid-institutes:grid.481554.9 schema:alternateName IBM T.J. Watson Research Center, 1101 Kitchawan Rd, 10598, Yorktown Heights, NY, USA
187 schema:name IBM T.J. Watson Research Center, 1101 Kitchawan Rd, 10598, Yorktown Heights, NY, USA
188 rdf:type schema:Organization
189 grid-institutes:grid.4818.5 schema:alternateName Wageningen University & Research, 6708 PB, Wageningen, Netherlands
190 schema:name Wageningen University & Research, 6708 PB, Wageningen, Netherlands
191 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...