Towards a practical O(n logn) phylogeny algorithm View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-11-26

AUTHORS

Jakub Truszkowski, Yanqi Hao, Daniel G Brown

ABSTRACT

BackgroundRecently, we have identified a randomized quartet phylogeny algorithm that has O(n logn) runtime with high probability, which is asymptotically optimal. Our algorithm has high probability of returning the correct phylogeny when quartet errors are independent and occur with known probability, and when the algorithm uses a guide tree on O(loglogn) taxa that is correct with high probability. In practice, none of these assumptions is correct: quartet errors are positively correlated and occur with unknown probability, and the guide tree is often error prone. Here, we bring our work out of the purely theoretical setting. We present a variety of extensions which, while only slowing the algorithm down by a constant factor, make its performance nearly comparable to that of Neighbour Joining , which requires Θ(n3) runtime in existing implementations. Our results suggest a new direction for quartet-based phylogenetic reconstruction that may yield striking speed improvements at minimal accuracy cost. An early prototype implementation of our software is available athttp://www.cs.uwaterloo.ca/jmtruszk/qtree.tar.gz. More... »

PAGES

32

References to SciGraph publications

  • 2009. Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep in RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY
  • 2011. Fast Error-Tolerant Quartet Phylogeny Algorithms in COMBINATORIAL PATTERN MATCHING
  • 2005-04-27. Scoredist: A simple and robust protein sequence distance estimator in BMC BIOINFORMATICS
  • 2009. Large-Scale Neighbor-Joining with NINJA in ALGORITHMS IN BIOINFORMATICS
  • 2005. Fast Neighbor Joining in AUTOMATA, LANGUAGES AND PROGRAMMING
  • 2003-10-10. Computing the Quartet Distance between Evolutionary Trees in Time O(n log n) in ALGORITHMICA
  • 2011. Towards a Practical O(n logn) Phylogeny Algorithm in ALGORITHMS IN BIOINFORMATICS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/1748-7188-7-32

    DOI

    http://dx.doi.org/10.1186/1748-7188-7-32

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1002393530

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/23181935


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0803", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Computer Software", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Ontario, Canada", 
              "id": "http://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Truszkowski", 
            "givenName": "Jakub", 
            "id": "sg:person.01320220640.40", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Ontario, Canada", 
              "id": "http://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hao", 
            "givenName": "Yanqi", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Ontario, Canada", 
              "id": "http://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Brown", 
            "givenName": "Daniel G", 
            "id": "sg:person.0642727740.54", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-3-642-23038-7_2", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050915819", 
              "https://doi.org/10.1007/978-3-642-23038-7_2"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11523468_102", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004861745", 
              "https://doi.org/10.1007/11523468_102"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s00453-003-1065-y", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1021604411", 
              "https://doi.org/10.1007/s00453-003-1065-y"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-02008-7_32", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050043191", 
              "https://doi.org/10.1007/978-3-642-02008-7_32"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-6-108", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037048502", 
              "https://doi.org/10.1186/1471-2105-6-108"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-21458-5_14", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008082437", 
              "https://doi.org/10.1007/978-3-642-21458-5_14"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-04241-6_31", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032768275", 
              "https://doi.org/10.1007/978-3-642-04241-6_31"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2012-11-26", 
        "datePublishedReg": "2012-11-26", 
        "description": "BackgroundRecently, we have identified a randomized quartet phylogeny algorithm that has O(n logn) runtime with high probability, which is asymptotically optimal. Our algorithm has high probability of returning the correct phylogeny when quartet errors are independent and occur with known probability, and when the algorithm uses a guide tree on O(loglogn) taxa that is correct with high probability. In practice, none of these assumptions is correct: quartet errors are positively correlated and occur with unknown probability, and the guide tree is often error prone. Here, we bring our work out of the purely theoretical setting. We present a variety of extensions which, while only slowing the algorithm down by a constant factor, make its performance nearly comparable to that of Neighbour Joining , which requires \u0398(n3) runtime in existing implementations. Our results suggest a new direction for quartet-based phylogenetic reconstruction that may yield striking speed improvements at minimal accuracy cost. An early prototype implementation of our software is available athttp://www.cs.uwaterloo.ca/jmtruszk/qtree.tar.gz.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/1748-7188-7-32", 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1036449", 
            "issn": [
              "1748-7188"
            ], 
            "name": "Algorithms for Molecular Biology", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "7"
          }
        ], 
        "keywords": [
          "phylogeny algorithms", 
          "quartet errors", 
          "variety of extensions", 
          "unknown probabilities", 
          "high probability", 
          "theoretical setting", 
          "constant factor", 
          "guide tree", 
          "algorithm", 
          "probability", 
          "speed improvement", 
          "error", 
          "correct phylogeny", 
          "accuracy cost", 
          "runtime", 
          "assumption", 
          "extension", 
          "implementation", 
          "direction", 
          "new directions", 
          "performance", 
          "trees", 
          "software", 
          "work", 
          "results", 
          "cost", 
          "reconstruction", 
          "variety", 
          "prototype implementation", 
          "phylogenetic reconstruction", 
          "improvement", 
          "setting", 
          "early prototype implementation", 
          "joining", 
          "factors", 
          "practice", 
          "phylogeny", 
          "taxa", 
          "Neighbour Joining"
        ], 
        "name": "Towards a practical O(n logn) phylogeny algorithm", 
        "pagination": "32", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1002393530"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/1748-7188-7-32"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "23181935"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/1748-7188-7-32", 
          "https://app.dimensions.ai/details/publication/pub.1002393530"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-12-01T06:30", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/article/article_575.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/1748-7188-7-32"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-7-32'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-7-32'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-7-32'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1748-7188-7-32'


     

    This table displays all metadata directly associated to this object as RDF triples.

    140 TRIPLES      21 PREDICATES      71 URIs      56 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/1748-7188-7-32 schema:about anzsrc-for:08
    2 anzsrc-for:0803
    3 schema:author N9831baeceeb0473db50081dab99b64e6
    4 schema:citation sg:pub.10.1007/11523468_102
    5 sg:pub.10.1007/978-3-642-02008-7_32
    6 sg:pub.10.1007/978-3-642-04241-6_31
    7 sg:pub.10.1007/978-3-642-21458-5_14
    8 sg:pub.10.1007/978-3-642-23038-7_2
    9 sg:pub.10.1007/s00453-003-1065-y
    10 sg:pub.10.1186/1471-2105-6-108
    11 schema:datePublished 2012-11-26
    12 schema:datePublishedReg 2012-11-26
    13 schema:description BackgroundRecently, we have identified a randomized quartet phylogeny algorithm that has O(n logn) runtime with high probability, which is asymptotically optimal. Our algorithm has high probability of returning the correct phylogeny when quartet errors are independent and occur with known probability, and when the algorithm uses a guide tree on O(loglogn) taxa that is correct with high probability. In practice, none of these assumptions is correct: quartet errors are positively correlated and occur with unknown probability, and the guide tree is often error prone. Here, we bring our work out of the purely theoretical setting. We present a variety of extensions which, while only slowing the algorithm down by a constant factor, make its performance nearly comparable to that of Neighbour Joining , which requires Θ(n3) runtime in existing implementations. Our results suggest a new direction for quartet-based phylogenetic reconstruction that may yield striking speed improvements at minimal accuracy cost. An early prototype implementation of our software is available athttp://www.cs.uwaterloo.ca/jmtruszk/qtree.tar.gz.
    14 schema:genre article
    15 schema:isAccessibleForFree true
    16 schema:isPartOf N1d593aac876b4eb9b7d2876a963e71d7
    17 Ncd84ecd7cb264a908baaf3ca7c71bc1f
    18 sg:journal.1036449
    19 schema:keywords Neighbour Joining
    20 accuracy cost
    21 algorithm
    22 assumption
    23 constant factor
    24 correct phylogeny
    25 cost
    26 direction
    27 early prototype implementation
    28 error
    29 extension
    30 factors
    31 guide tree
    32 high probability
    33 implementation
    34 improvement
    35 joining
    36 new directions
    37 performance
    38 phylogenetic reconstruction
    39 phylogeny
    40 phylogeny algorithms
    41 practice
    42 probability
    43 prototype implementation
    44 quartet errors
    45 reconstruction
    46 results
    47 runtime
    48 setting
    49 software
    50 speed improvement
    51 taxa
    52 theoretical setting
    53 trees
    54 unknown probabilities
    55 variety
    56 variety of extensions
    57 work
    58 schema:name Towards a practical O(n logn) phylogeny algorithm
    59 schema:pagination 32
    60 schema:productId N8f9df9e82052476cb4d4f9aba76c36ee
    61 Na75d4a09b644475eb54baca7a2783d97
    62 Nb19f6206e03a445dab46a7e05affc404
    63 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002393530
    64 https://doi.org/10.1186/1748-7188-7-32
    65 schema:sdDatePublished 2022-12-01T06:30
    66 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    67 schema:sdPublisher N04f7b96a40cd41618286c824b3148ba4
    68 schema:url https://doi.org/10.1186/1748-7188-7-32
    69 sgo:license sg:explorer/license/
    70 sgo:sdDataset articles
    71 rdf:type schema:ScholarlyArticle
    72 N04f7b96a40cd41618286c824b3148ba4 schema:name Springer Nature - SN SciGraph project
    73 rdf:type schema:Organization
    74 N1d593aac876b4eb9b7d2876a963e71d7 schema:issueNumber 1
    75 rdf:type schema:PublicationIssue
    76 N30ba621b6d5d486bb2ee499141c67d74 rdf:first sg:person.0642727740.54
    77 rdf:rest rdf:nil
    78 N8f9df9e82052476cb4d4f9aba76c36ee schema:name doi
    79 schema:value 10.1186/1748-7188-7-32
    80 rdf:type schema:PropertyValue
    81 N9831baeceeb0473db50081dab99b64e6 rdf:first sg:person.01320220640.40
    82 rdf:rest Ne988c5f5aeff4ec2898da1be342e4f8f
    83 Na75d4a09b644475eb54baca7a2783d97 schema:name dimensions_id
    84 schema:value pub.1002393530
    85 rdf:type schema:PropertyValue
    86 Nb19f6206e03a445dab46a7e05affc404 schema:name pubmed_id
    87 schema:value 23181935
    88 rdf:type schema:PropertyValue
    89 Ncd84ecd7cb264a908baaf3ca7c71bc1f schema:volumeNumber 7
    90 rdf:type schema:PublicationVolume
    91 Ne988c5f5aeff4ec2898da1be342e4f8f rdf:first Nfcb595031877426595c0aac4b250d0a7
    92 rdf:rest N30ba621b6d5d486bb2ee499141c67d74
    93 Nfcb595031877426595c0aac4b250d0a7 schema:affiliation grid-institutes:grid.46078.3d
    94 schema:familyName Hao
    95 schema:givenName Yanqi
    96 rdf:type schema:Person
    97 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    98 schema:name Information and Computing Sciences
    99 rdf:type schema:DefinedTerm
    100 anzsrc-for:0803 schema:inDefinedTermSet anzsrc-for:
    101 schema:name Computer Software
    102 rdf:type schema:DefinedTerm
    103 sg:journal.1036449 schema:issn 1748-7188
    104 schema:name Algorithms for Molecular Biology
    105 schema:publisher Springer Nature
    106 rdf:type schema:Periodical
    107 sg:person.01320220640.40 schema:affiliation grid-institutes:grid.46078.3d
    108 schema:familyName Truszkowski
    109 schema:givenName Jakub
    110 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40
    111 rdf:type schema:Person
    112 sg:person.0642727740.54 schema:affiliation grid-institutes:grid.46078.3d
    113 schema:familyName Brown
    114 schema:givenName Daniel G
    115 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
    116 rdf:type schema:Person
    117 sg:pub.10.1007/11523468_102 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004861745
    118 https://doi.org/10.1007/11523468_102
    119 rdf:type schema:CreativeWork
    120 sg:pub.10.1007/978-3-642-02008-7_32 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050043191
    121 https://doi.org/10.1007/978-3-642-02008-7_32
    122 rdf:type schema:CreativeWork
    123 sg:pub.10.1007/978-3-642-04241-6_31 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032768275
    124 https://doi.org/10.1007/978-3-642-04241-6_31
    125 rdf:type schema:CreativeWork
    126 sg:pub.10.1007/978-3-642-21458-5_14 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008082437
    127 https://doi.org/10.1007/978-3-642-21458-5_14
    128 rdf:type schema:CreativeWork
    129 sg:pub.10.1007/978-3-642-23038-7_2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050915819
    130 https://doi.org/10.1007/978-3-642-23038-7_2
    131 rdf:type schema:CreativeWork
    132 sg:pub.10.1007/s00453-003-1065-y schema:sameAs https://app.dimensions.ai/details/publication/pub.1021604411
    133 https://doi.org/10.1007/s00453-003-1065-y
    134 rdf:type schema:CreativeWork
    135 sg:pub.10.1186/1471-2105-6-108 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037048502
    136 https://doi.org/10.1186/1471-2105-6-108
    137 rdf:type schema:CreativeWork
    138 grid-institutes:grid.46078.3d schema:alternateName David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Ontario, Canada
    139 schema:name David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Ontario, Canada
    140 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...