Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2012

AUTHORS

Daniel G. Brown , Jakub Truszkowski

ABSTRACT

We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are necessary. Other fast heuristics have been developed for building trees from large alignments [18,1], but they lack theoretical performance guarantees. Our new algorithm runs in O(n 1 + γ(g)log2 n) time, where γ is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound g must be below \(1/2-\sqrt{1/8} \approx 0.15\), and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is near-linear. For example, if all branches have mutation probability less than 0.02, the running time of our algorithm is roughly O(n 1.2log2 n). Our preliminary experiments show that many large phylogenies can be reconstructed more accurately than allowed by current methods, in comparable running times. More... »

PAGES

14-29

References to SciGraph publications

Book

TITLE

Algorithms in Bioinformatics

ISBN

978-3-642-33121-3
978-3-642-33122-0

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2

DOI

http://dx.doi.org/10.1007/978-3-642-33122-0_2

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1009821975


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brown", 
        "givenName": "Daniel G.", 
        "id": "sg:person.0642727740.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Truszkowski", 
        "givenName": "Jakub", 
        "id": "sg:person.01320220640.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1145/276698.276876", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004824658"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11523468_102", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004861745", 
          "https://doi.org/10.1007/11523468_102"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11523468_102", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004861745", 
          "https://doi.org/10.1007/11523468_102"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00453-012-9644-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016011815", 
          "https://doi.org/10.1007/s00453-012-9644-4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/369133.369172", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018080948"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/369133.369172", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018080948"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00440-009-0246-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024644090", 
          "https://doi.org/10.1007/s00440-009-0246-2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00440-009-0246-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024644090", 
          "https://doi.org/10.1007/s00440-009-0246-2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/molbev/msp077", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028540883"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/molbev/msp077", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028540883"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0304-3975(99)00028-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030334849"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1128/aem.03006-05", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1034568952"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s11538-010-9505-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035649324", 
          "https://doi.org/10.1007/s11538-010-9505-8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s11538-010-9505-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035649324", 
          "https://doi.org/10.1007/s11538-010-9505-8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1090/s0002-9947-03-03382-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047570264"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-02008-7_32", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050043191", 
          "https://doi.org/10.1007/978-3-642-02008-7_32"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-02008-7_32", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050043191", 
          "https://doi.org/10.1007/978-3-642-02008-7_32"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-23038-7_2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050915819", 
          "https://doi.org/10.1007/978-3-642-23038-7_2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-23038-7_2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050915819", 
          "https://doi.org/10.1007/978-3-642-23038-7_2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1126/science.1171243", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053629872"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1089/10665270252935467", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1059204929"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/aoap/1019487349", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1064397484"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2012", 
    "datePublishedReg": "2012-01-01", 
    "description": "We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are necessary. Other fast heuristics have been developed for building trees from large alignments [18,1], but they lack theoretical performance guarantees. Our new algorithm runs in O(n 1\u2009+\u2009\u03b3(g)log2 n) time, where \u03b3 is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound g must be below \\(1/2-\\sqrt{1/8} \\approx 0.15\\), and \u03b3(g)\u2009<\u20091 for all g. For phylogenies with very short branches, the running time of our algorithm is near-linear. For example, if all branches have mutation probability less than 0.02, the running time of our algorithm is roughly O(n 1.2log2 n). Our preliminary experiments show that many large phylogenies can be reconstructed more accurately than allowed by current methods, in comparable running times.", 
    "editor": [
      {
        "familyName": "Raphael", 
        "givenName": "Ben", 
        "type": "Person"
      }, 
      {
        "familyName": "Tang", 
        "givenName": "Jijun", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-33122-0_2", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-642-33121-3", 
        "978-3-642-33122-0"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "name": "Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing", 
    "pagination": "14-29", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-33122-0_2"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "2ecaca77f024032ba96220a8c2ad52e482c25fcab4c8136c33fcf71523b048d3"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1009821975"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-33122-0_2", 
      "https://app.dimensions.ai/details/publication/pub.1009821975"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T15:19", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8672_00000249.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/978-3-642-33122-0_2"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2'


 

This table displays all metadata directly associated to this object as RDF triples.

128 TRIPLES      23 PREDICATES      42 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-33122-0_2 schema:about anzsrc-for:01
2 anzsrc-for:0104
3 schema:author N61c3dddc5e7746799f255ab14e4ce19b
4 schema:citation sg:pub.10.1007/11523468_102
5 sg:pub.10.1007/978-3-642-02008-7_32
6 sg:pub.10.1007/978-3-642-23038-7_2
7 sg:pub.10.1007/s00440-009-0246-2
8 sg:pub.10.1007/s00453-012-9644-4
9 sg:pub.10.1007/s11538-010-9505-8
10 https://doi.org/10.1016/s0304-3975(99)00028-6
11 https://doi.org/10.1089/10665270252935467
12 https://doi.org/10.1090/s0002-9947-03-03382-8
13 https://doi.org/10.1093/molbev/msp077
14 https://doi.org/10.1126/science.1171243
15 https://doi.org/10.1128/aem.03006-05
16 https://doi.org/10.1145/276698.276876
17 https://doi.org/10.1145/369133.369172
18 https://doi.org/10.1214/aoap/1019487349
19 schema:datePublished 2012
20 schema:datePublishedReg 2012-01-01
21 schema:description We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are necessary. Other fast heuristics have been developed for building trees from large alignments [18,1], but they lack theoretical performance guarantees. Our new algorithm runs in O(n 1 + γ(g)log2 n) time, where γ is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound g must be below \(1/2-\sqrt{1/8} \approx 0.15\), and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is near-linear. For example, if all branches have mutation probability less than 0.02, the running time of our algorithm is roughly O(n 1.2log2 n). Our preliminary experiments show that many large phylogenies can be reconstructed more accurately than allowed by current methods, in comparable running times.
22 schema:editor N4f14cb98f1c54b4bb255f04079406be2
23 schema:genre chapter
24 schema:inLanguage en
25 schema:isAccessibleForFree false
26 schema:isPartOf Na3d58a5f7efe4db1bece0514be0f0f57
27 schema:name Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing
28 schema:pagination 14-29
29 schema:productId N0997799c78de4cf3a2bd1835bf946e4b
30 N1b00091e35814204a1b203648d17580f
31 N5c33fe88e88f4994ab836a01ad3596b6
32 schema:publisher N43007a05f26e47a59f4be94faffcfe35
33 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009821975
34 https://doi.org/10.1007/978-3-642-33122-0_2
35 schema:sdDatePublished 2019-04-15T15:19
36 schema:sdLicense https://scigraph.springernature.com/explorer/license/
37 schema:sdPublisher N360bd7cce3c14e82a61e10b29671fd90
38 schema:url http://link.springer.com/10.1007/978-3-642-33122-0_2
39 sgo:license sg:explorer/license/
40 sgo:sdDataset chapters
41 rdf:type schema:Chapter
42 N0997799c78de4cf3a2bd1835bf946e4b schema:name dimensions_id
43 schema:value pub.1009821975
44 rdf:type schema:PropertyValue
45 N1b00091e35814204a1b203648d17580f schema:name readcube_id
46 schema:value 2ecaca77f024032ba96220a8c2ad52e482c25fcab4c8136c33fcf71523b048d3
47 rdf:type schema:PropertyValue
48 N360bd7cce3c14e82a61e10b29671fd90 schema:name Springer Nature - SN SciGraph project
49 rdf:type schema:Organization
50 N43007a05f26e47a59f4be94faffcfe35 schema:location Berlin, Heidelberg
51 schema:name Springer Berlin Heidelberg
52 rdf:type schema:Organisation
53 N4f14cb98f1c54b4bb255f04079406be2 rdf:first N6e629394f8af45acbaf3462b8567c28e
54 rdf:rest N648670ea4190438b9e951e00ad7608cf
55 N5c33fe88e88f4994ab836a01ad3596b6 schema:name doi
56 schema:value 10.1007/978-3-642-33122-0_2
57 rdf:type schema:PropertyValue
58 N61c3dddc5e7746799f255ab14e4ce19b rdf:first sg:person.0642727740.54
59 rdf:rest Nc4f7c52203b64c02a24d17fdbcfb882e
60 N648670ea4190438b9e951e00ad7608cf rdf:first Nc1ff310f82e64643abfa0b3613b60f8f
61 rdf:rest rdf:nil
62 N6e629394f8af45acbaf3462b8567c28e schema:familyName Raphael
63 schema:givenName Ben
64 rdf:type schema:Person
65 Na3d58a5f7efe4db1bece0514be0f0f57 schema:isbn 978-3-642-33121-3
66 978-3-642-33122-0
67 schema:name Algorithms in Bioinformatics
68 rdf:type schema:Book
69 Nc1ff310f82e64643abfa0b3613b60f8f schema:familyName Tang
70 schema:givenName Jijun
71 rdf:type schema:Person
72 Nc4f7c52203b64c02a24d17fdbcfb882e rdf:first sg:person.01320220640.40
73 rdf:rest rdf:nil
74 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
75 schema:name Mathematical Sciences
76 rdf:type schema:DefinedTerm
77 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
78 schema:name Statistics
79 rdf:type schema:DefinedTerm
80 sg:person.01320220640.40 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
81 schema:familyName Truszkowski
82 schema:givenName Jakub
83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40
84 rdf:type schema:Person
85 sg:person.0642727740.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
86 schema:familyName Brown
87 schema:givenName Daniel G.
88 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
89 rdf:type schema:Person
90 sg:pub.10.1007/11523468_102 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004861745
91 https://doi.org/10.1007/11523468_102
92 rdf:type schema:CreativeWork
93 sg:pub.10.1007/978-3-642-02008-7_32 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050043191
94 https://doi.org/10.1007/978-3-642-02008-7_32
95 rdf:type schema:CreativeWork
96 sg:pub.10.1007/978-3-642-23038-7_2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050915819
97 https://doi.org/10.1007/978-3-642-23038-7_2
98 rdf:type schema:CreativeWork
99 sg:pub.10.1007/s00440-009-0246-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024644090
100 https://doi.org/10.1007/s00440-009-0246-2
101 rdf:type schema:CreativeWork
102 sg:pub.10.1007/s00453-012-9644-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016011815
103 https://doi.org/10.1007/s00453-012-9644-4
104 rdf:type schema:CreativeWork
105 sg:pub.10.1007/s11538-010-9505-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035649324
106 https://doi.org/10.1007/s11538-010-9505-8
107 rdf:type schema:CreativeWork
108 https://doi.org/10.1016/s0304-3975(99)00028-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030334849
109 rdf:type schema:CreativeWork
110 https://doi.org/10.1089/10665270252935467 schema:sameAs https://app.dimensions.ai/details/publication/pub.1059204929
111 rdf:type schema:CreativeWork
112 https://doi.org/10.1090/s0002-9947-03-03382-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047570264
113 rdf:type schema:CreativeWork
114 https://doi.org/10.1093/molbev/msp077 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028540883
115 rdf:type schema:CreativeWork
116 https://doi.org/10.1126/science.1171243 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053629872
117 rdf:type schema:CreativeWork
118 https://doi.org/10.1128/aem.03006-05 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034568952
119 rdf:type schema:CreativeWork
120 https://doi.org/10.1145/276698.276876 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004824658
121 rdf:type schema:CreativeWork
122 https://doi.org/10.1145/369133.369172 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018080948
123 rdf:type schema:CreativeWork
124 https://doi.org/10.1214/aoap/1019487349 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064397484
125 rdf:type schema:CreativeWork
126 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
127 schema:name David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
128 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...