Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2012

AUTHORS

Daniel G. Brown , Jakub Truszkowski

ABSTRACT

We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are necessary. Other fast heuristics have been developed for building trees from large alignments [18,1], but they lack theoretical performance guarantees. Our new algorithm runs in O(n1 + γ(g)log2n) time, where γ is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound g must be below \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$1/2-\sqrt{1/8} \approx 0.15$\end{document}, and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is near-linear. For example, if all branches have mutation probability less than 0.02, the running time of our algorithm is roughly O(n1.2log2n). Our preliminary experiments show that many large phylogenies can be reconstructed more accurately than allowed by current methods, in comparable running times. More... »

PAGES

14-29

Book

TITLE

Algorithms in Bioinformatics

ISBN

978-3-642-33121-3
978-3-642-33122-0

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2

DOI

http://dx.doi.org/10.1007/978-3-642-33122-0_2

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1009821975


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0102", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Applied Mathematics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "David R. Cheriton School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada", 
          "id": "http://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "David R. Cheriton School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brown", 
        "givenName": "Daniel G.", 
        "id": "sg:person.0642727740.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "David R. Cheriton School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada", 
          "id": "http://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "David R. Cheriton School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Truszkowski", 
        "givenName": "Jakub", 
        "id": "sg:person.01320220640.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2012", 
    "datePublishedReg": "2012-01-01", 
    "description": "We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are necessary. Other fast heuristics have been developed for building trees from large alignments [18,1], but they lack theoretical performance guarantees. Our new algorithm runs in O(n1\u2009+\u2009\u03b3(g)log2n) time, where \u03b3 is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound g must be below \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$1/2-\\sqrt{1/8} \\approx 0.15$\\end{document}, and \u03b3(g)\u2009<\u20091 for all g. For phylogenies with very short branches, the running time of our algorithm is near-linear. For example, if all branches have mutation probability less than 0.02, the running time of our algorithm is roughly O(n1.2log2n). Our preliminary experiments show that many large phylogenies can be reconstructed more accurately than allowed by current methods, in comparable running times.", 
    "editor": [
      {
        "familyName": "Raphael", 
        "givenName": "Ben", 
        "type": "Person"
      }, 
      {
        "familyName": "Tang", 
        "givenName": "Jijun", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-33122-0_2", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-642-33121-3", 
        "978-3-642-33122-0"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "keywords": [
      "theoretical performance guarantees", 
      "fast algorithm", 
      "performance guarantees", 
      "fast heuristic", 
      "mutation probability", 
      "running time", 
      "time algorithm", 
      "algorithm runs", 
      "large phylogenies", 
      "Markov model", 
      "algorithm", 
      "phylogenetic tree reconstruction", 
      "large alignments", 
      "probability", 
      "high probability", 
      "Locality Sensitive Hashing", 
      "tree reconstruction", 
      "current methods", 
      "branches", 
      "guarantees", 
      "building trees", 
      "heuristics", 
      "new algorithm runs", 
      "model", 
      "function", 
      "expansion", 
      "time", 
      "evolution", 
      "short branches", 
      "short sequences", 
      "trees", 
      "hashing", 
      "experiments", 
      "run", 
      "reconstruction", 
      "preliminary experiments", 
      "sequence", 
      "phylogenetic tree", 
      "alignment", 
      "phylogeny", 
      "sequence databases", 
      "database", 
      "rapid expansion", 
      "example", 
      "method"
    ], 
    "name": "Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing", 
    "pagination": "14-29", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1009821975"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-33122-0_2"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-33122-0_2", 
      "https://app.dimensions.ai/details/publication/pub.1009821975"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-12-01T06:54", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/chapter/chapter_473.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-33122-0_2"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-33122-0_2'


 

This table displays all metadata directly associated to this object as RDF triples.

116 TRIPLES      22 PREDICATES      70 URIs      63 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-33122-0_2 schema:about anzsrc-for:01
2 anzsrc-for:0102
3 schema:author Nbdcb57751190470faf169c8404f20935
4 schema:datePublished 2012
5 schema:datePublishedReg 2012-01-01
6 schema:description We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are necessary. Other fast heuristics have been developed for building trees from large alignments [18,1], but they lack theoretical performance guarantees. Our new algorithm runs in O(n1 + γ(g)log2n) time, where γ is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound g must be below \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$1/2-\sqrt{1/8} \approx 0.15$\end{document}, and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is near-linear. For example, if all branches have mutation probability less than 0.02, the running time of our algorithm is roughly O(n1.2log2n). Our preliminary experiments show that many large phylogenies can be reconstructed more accurately than allowed by current methods, in comparable running times.
7 schema:editor Nc56fbe11062747d78501e40e6c7bf278
8 schema:genre chapter
9 schema:isAccessibleForFree false
10 schema:isPartOf Nfdfbc9a3994b43698bf7859cf77889d0
11 schema:keywords Locality Sensitive Hashing
12 Markov model
13 algorithm
14 algorithm runs
15 alignment
16 branches
17 building trees
18 current methods
19 database
20 evolution
21 example
22 expansion
23 experiments
24 fast algorithm
25 fast heuristic
26 function
27 guarantees
28 hashing
29 heuristics
30 high probability
31 large alignments
32 large phylogenies
33 method
34 model
35 mutation probability
36 new algorithm runs
37 performance guarantees
38 phylogenetic tree
39 phylogenetic tree reconstruction
40 phylogeny
41 preliminary experiments
42 probability
43 rapid expansion
44 reconstruction
45 run
46 running time
47 sequence
48 sequence databases
49 short branches
50 short sequences
51 theoretical performance guarantees
52 time
53 time algorithm
54 tree reconstruction
55 trees
56 schema:name Fast Phylogenetic Tree Reconstruction Using Locality-Sensitive Hashing
57 schema:pagination 14-29
58 schema:productId N1cf9a6fe57d44550a5f7d01a80c2d37e
59 N32578432d2c94906b92d3bfe6068a7a2
60 schema:publisher N49b28467851a4cfd99fef6c7240a79e8
61 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009821975
62 https://doi.org/10.1007/978-3-642-33122-0_2
63 schema:sdDatePublished 2022-12-01T06:54
64 schema:sdLicense https://scigraph.springernature.com/explorer/license/
65 schema:sdPublisher N2c627bdb0c354c249fa561b2f69279e7
66 schema:url https://doi.org/10.1007/978-3-642-33122-0_2
67 sgo:license sg:explorer/license/
68 sgo:sdDataset chapters
69 rdf:type schema:Chapter
70 N07769ae3839b42b680f3c10b41b9359f schema:familyName Tang
71 schema:givenName Jijun
72 rdf:type schema:Person
73 N1cf9a6fe57d44550a5f7d01a80c2d37e schema:name dimensions_id
74 schema:value pub.1009821975
75 rdf:type schema:PropertyValue
76 N2c627bdb0c354c249fa561b2f69279e7 schema:name Springer Nature - SN SciGraph project
77 rdf:type schema:Organization
78 N32578432d2c94906b92d3bfe6068a7a2 schema:name doi
79 schema:value 10.1007/978-3-642-33122-0_2
80 rdf:type schema:PropertyValue
81 N49b28467851a4cfd99fef6c7240a79e8 schema:name Springer Nature
82 rdf:type schema:Organisation
83 N6cae2c62fd1143ea833b56a31a372526 rdf:first N07769ae3839b42b680f3c10b41b9359f
84 rdf:rest rdf:nil
85 Nba0e191d00e942d088084bd890cd5b17 schema:familyName Raphael
86 schema:givenName Ben
87 rdf:type schema:Person
88 Nbdcb57751190470faf169c8404f20935 rdf:first sg:person.0642727740.54
89 rdf:rest Ned4dcc88d4d9489f9c9411c54782a6d6
90 Nc56fbe11062747d78501e40e6c7bf278 rdf:first Nba0e191d00e942d088084bd890cd5b17
91 rdf:rest N6cae2c62fd1143ea833b56a31a372526
92 Ned4dcc88d4d9489f9c9411c54782a6d6 rdf:first sg:person.01320220640.40
93 rdf:rest rdf:nil
94 Nfdfbc9a3994b43698bf7859cf77889d0 schema:isbn 978-3-642-33121-3
95 978-3-642-33122-0
96 schema:name Algorithms in Bioinformatics
97 rdf:type schema:Book
98 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
99 schema:name Mathematical Sciences
100 rdf:type schema:DefinedTerm
101 anzsrc-for:0102 schema:inDefinedTermSet anzsrc-for:
102 schema:name Applied Mathematics
103 rdf:type schema:DefinedTerm
104 sg:person.01320220640.40 schema:affiliation grid-institutes:grid.46078.3d
105 schema:familyName Truszkowski
106 schema:givenName Jakub
107 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40
108 rdf:type schema:Person
109 sg:person.0642727740.54 schema:affiliation grid-institutes:grid.46078.3d
110 schema:familyName Brown
111 schema:givenName Daniel G.
112 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
113 rdf:type schema:Person
114 grid-institutes:grid.46078.3d schema:alternateName David R. Cheriton School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada
115 schema:name David R. Cheriton School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada
116 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...