ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels. View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2019-12

AUTHORS

Gurjit S Randhawa, Kathleen A Hill, Lila Kari

ABSTRACT

BACKGROUND: Although software tools abound for the comparison, analysis, identification, and classification of genomic sequences, taxonomic classification remains challenging due to the magnitude of the datasets and the intrinsic problems associated with classification. The need exists for an approach and software tool that addresses the limitations of existing alignment-based methods, as well as the challenges of recently proposed alignment-free methods. RESULTS: We propose a novel combination of supervised Machine Learning with Digital Signal Processing, resulting in ML-DSP: an alignment-free software tool for ultrafast, accurate, and scalable genome classification at all taxonomic levels. We test ML-DSP by classifying 7396 full mitochondrial genomes at various taxonomic levels, from kingdom to genus, with an average classification accuracy of >97%. A quantitative comparison with state-of-the-art classification software tools is performed, on two small benchmark datasets and one large 4322 vertebrate mtDNA genomes dataset. Our results show that ML-DSP overwhelmingly outperforms the alignment-based software MEGA7 (alignment with MUSCLE or CLUSTALW) in terms of processing time, while having comparable classification accuracies for small datasets and superior accuracies for the large dataset. Compared with the alignment-free software FFP (Feature Frequency Profile), ML-DSP has significantly better classification accuracy, and is overall faster. We also provide preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4271 complete dengue virus genomes into subtypes with 100% accuracy, and 4,710 bacterial genomes into phyla with 95.5% accuracy. Lastly, our analysis shows that the "Purine/Pyrimidine", "Just-A" and "Real" numerical representations of DNA sequences outperform ten other such numerical representations used in the Digital Signal Processing literature for DNA classification purposes. CONCLUSIONS: Due to its superior classification accuracy, speed, and scalability to large datasets, ML-DSP is highly relevant in the classification of newly discovered organisms, in distinguishing genomic signatures and identifying their mechanistic determinants, and in evaluating genome integrity. More... »

PAGES

267

References to SciGraph publications

Journal

TITLE

BMC Genomics

ISSUE

1

VOLUME

20

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s12864-019-5571-y

DOI

http://dx.doi.org/10.1186/s12864-019-5571-y

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1113181741

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/30943897


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Western University", 
          "id": "https://www.grid.ac/institutes/grid.39381.30", 
          "name": [
            "Department of Computer Science, University of Western Ontario, London, ON, Canada. grandha8@uwo.ca."
          ], 
          "type": "Organization"
        }, 
        "familyName": "Randhawa", 
        "givenName": "Gurjit S", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Western University", 
          "id": "https://www.grid.ac/institutes/grid.39381.30", 
          "name": [
            "Department of Biology, University of Western Ontario, London, ON, Canada."
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hill", 
        "givenName": "Kathleen A", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, Waterloo, ON, Canada."
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kari", 
        "givenName": "Lila", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1186/s13040-015-0073-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001343002", 
          "https://doi.org/10.1186/s13040-015-0073-1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.jtbi.2014.05.043", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002252738"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-319-33618-3_25", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004984300", 
          "https://doi.org/10.1007/978-3-319-33618-3_25"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/bti607", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005222175"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pcbi.1000581", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005613156"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.compbiomed.2015.05.022", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007088740"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btm404", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007683223"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bib/bbt067", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007818694"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-47961-9_27", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008406570", 
          "https://doi.org/10.1007/3-540-47961-9_27"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-14-s10-s1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009261916", 
          "https://doi.org/10.1186/1471-2105-14-s10-s1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bib/bbt072", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009442051"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/17.5.429", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009721349"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/molbev/msw054", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012564990"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bib/bbt070", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013026493"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.ygeno.2016.08.002", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013649729"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gku739", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013715418"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0119815", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017078210"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13040-016-0116-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017250924", 
          "https://doi.org/10.1186/s13040-016-0116-2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13040-016-0116-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017250924", 
          "https://doi.org/10.1186/s13040-016-0116-2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.0813249106", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017699788"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bib/bbt068", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018549458"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-319-27400-3_25", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018946422", 
          "https://doi.org/10.1007/978-3-319-27400-3_25"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/10408340500526766", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019373831"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-15-321", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019437002", 
          "https://doi.org/10.1186/1471-2105-15-321"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1582-4934.2002.tb00196.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019512218"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/jcc.20922", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021253586"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkh340", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025846396"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.jtbi.2015.06.033", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028628380"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btu177", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029212748"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pbio.1001130", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1034537579"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0165-1684(02)00477-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035477892"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0165-1684(02)00477-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035477892"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.neucom.2016.09.077", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038640165"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.jtbi.2011.01.038", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038725900"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.jtbi.2015.08.007", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039416914"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.bse.2016.07.012", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039787609"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pbio.1001127", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039826087"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/22.22.4673", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042438223"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btg005", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043080454"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.4137/ebo.s7364", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044611911"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.ins.2014.04.029", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045237460"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0110954", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045535857"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.jtbi.2015.02.026", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046824945"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bib/bbt052", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1059413022"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.2174/157489309787158134", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1069217454"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.5815/ijitcs.2012.08.03", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1073149925"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.6026/97320630004463", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1073594092"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/s41598-016-0028-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1079403223", 
          "https://doi.org/10.1038/s41598-016-0028-x"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0173288", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1084296733"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12859-017-1602-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1084740427", 
          "https://doi.org/10.1186/s12859-017-1602-3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12859-017-1602-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1084740427", 
          "https://doi.org/10.1186/s12859-017-1602-3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btx367", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1085899231"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-017-1319-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1092065837", 
          "https://doi.org/10.1186/s13059-017-1319-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/eit.2009.5189632", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093255351"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.cogsys.2018.01.006", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1100860857"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.cogsys.2018.01.006", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1100860857"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/apsipa.2017.8282195", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1100942749"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0206409", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1109897754"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0206409", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1109897754"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0206409", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1109897754"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2019-12", 
    "datePublishedReg": "2019-12-01", 
    "description": "BACKGROUND: Although software tools abound for the comparison, analysis, identification, and classification of genomic sequences, taxonomic classification remains challenging due to the magnitude of the datasets and the intrinsic problems associated with classification. The need exists for an approach and software tool that addresses the limitations of existing alignment-based methods, as well as the challenges of recently proposed alignment-free methods.\nRESULTS: We propose a novel combination of supervised Machine Learning with Digital Signal Processing, resulting in ML-DSP: an alignment-free software tool for ultrafast, accurate, and scalable genome classification at all taxonomic levels. We test ML-DSP by classifying 7396 full mitochondrial genomes at various taxonomic levels, from kingdom to genus, with an average classification accuracy of >97%. A quantitative comparison with state-of-the-art classification software tools is performed, on two small benchmark datasets and one large 4322 vertebrate mtDNA genomes dataset. Our results show that ML-DSP overwhelmingly outperforms the alignment-based software MEGA7 (alignment with MUSCLE or CLUSTALW) in terms of processing time, while having comparable classification accuracies for small datasets and superior accuracies for the large dataset. Compared with the alignment-free software FFP (Feature Frequency Profile), ML-DSP has significantly better classification accuracy, and is overall faster. We also provide preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4271 complete dengue virus genomes into subtypes with 100% accuracy, and 4,710 bacterial genomes into phyla with 95.5% accuracy. Lastly, our analysis shows that the \"Purine/Pyrimidine\", \"Just-A\" and \"Real\" numerical representations of DNA sequences outperform ten other such numerical representations used in the Digital Signal Processing literature for DNA classification purposes.\nCONCLUSIONS: Due to its superior classification accuracy, speed, and scalability to large datasets, ML-DSP is highly relevant in the classification of newly discovered organisms, in distinguishing genomic signatures and identifying their mechanistic determinants, and in evaluating genome integrity.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/s12864-019-5571-y", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023790", 
        "issn": [
          "1471-2164"
        ], 
        "name": "BMC Genomics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "20"
      }
    ], 
    "name": "ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.", 
    "pagination": "267", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s12864-019-5571-y"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1113181741"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965258"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "30943897"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s12864-019-5571-y", 
      "https://app.dimensions.ai/details/publication/pub.1113181741"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-15T09:01", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000375_0000000375/records_91447_00000001.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-5571-y"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12864-019-5571-y'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12864-019-5571-y'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12864-019-5571-y'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12864-019-5571-y'


 

This table displays all metadata directly associated to this object as RDF triples.

251 TRIPLES      21 PREDICATES      82 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s12864-019-5571-y schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author Nc090bdfc692f4cf49f6cb7d66c773bef
4 schema:citation sg:pub.10.1007/3-540-47961-9_27
5 sg:pub.10.1007/978-3-319-27400-3_25
6 sg:pub.10.1007/978-3-319-33618-3_25
7 sg:pub.10.1038/s41598-016-0028-x
8 sg:pub.10.1186/1471-2105-14-s10-s1
9 sg:pub.10.1186/1471-2105-15-321
10 sg:pub.10.1186/s12859-017-1602-3
11 sg:pub.10.1186/s13040-015-0073-1
12 sg:pub.10.1186/s13040-016-0116-2
13 sg:pub.10.1186/s13059-017-1319-7
14 https://doi.org/10.1002/jcc.20922
15 https://doi.org/10.1016/j.bse.2016.07.012
16 https://doi.org/10.1016/j.cogsys.2018.01.006
17 https://doi.org/10.1016/j.compbiomed.2015.05.022
18 https://doi.org/10.1016/j.ins.2014.04.029
19 https://doi.org/10.1016/j.jtbi.2011.01.038
20 https://doi.org/10.1016/j.jtbi.2014.05.043
21 https://doi.org/10.1016/j.jtbi.2015.02.026
22 https://doi.org/10.1016/j.jtbi.2015.06.033
23 https://doi.org/10.1016/j.jtbi.2015.08.007
24 https://doi.org/10.1016/j.neucom.2016.09.077
25 https://doi.org/10.1016/j.ygeno.2016.08.002
26 https://doi.org/10.1016/s0165-1684(02)00477-2
27 https://doi.org/10.1073/pnas.0813249106
28 https://doi.org/10.1080/10408340500526766
29 https://doi.org/10.1093/bib/bbt052
30 https://doi.org/10.1093/bib/bbt067
31 https://doi.org/10.1093/bib/bbt068
32 https://doi.org/10.1093/bib/bbt070
33 https://doi.org/10.1093/bib/bbt072
34 https://doi.org/10.1093/bioinformatics/17.5.429
35 https://doi.org/10.1093/bioinformatics/btg005
36 https://doi.org/10.1093/bioinformatics/bti607
37 https://doi.org/10.1093/bioinformatics/btm404
38 https://doi.org/10.1093/bioinformatics/btu177
39 https://doi.org/10.1093/bioinformatics/btx367
40 https://doi.org/10.1093/molbev/msw054
41 https://doi.org/10.1093/nar/22.22.4673
42 https://doi.org/10.1093/nar/gkh340
43 https://doi.org/10.1093/nar/gku739
44 https://doi.org/10.1109/apsipa.2017.8282195
45 https://doi.org/10.1109/eit.2009.5189632
46 https://doi.org/10.1111/j.1582-4934.2002.tb00196.x
47 https://doi.org/10.1371/journal.pbio.1001127
48 https://doi.org/10.1371/journal.pbio.1001130
49 https://doi.org/10.1371/journal.pcbi.1000581
50 https://doi.org/10.1371/journal.pone.0110954
51 https://doi.org/10.1371/journal.pone.0119815
52 https://doi.org/10.1371/journal.pone.0173288
53 https://doi.org/10.1371/journal.pone.0206409
54 https://doi.org/10.2174/157489309787158134
55 https://doi.org/10.4137/ebo.s7364
56 https://doi.org/10.5815/ijitcs.2012.08.03
57 https://doi.org/10.6026/97320630004463
58 schema:datePublished 2019-12
59 schema:datePublishedReg 2019-12-01
60 schema:description BACKGROUND: Although software tools abound for the comparison, analysis, identification, and classification of genomic sequences, taxonomic classification remains challenging due to the magnitude of the datasets and the intrinsic problems associated with classification. The need exists for an approach and software tool that addresses the limitations of existing alignment-based methods, as well as the challenges of recently proposed alignment-free methods. RESULTS: We propose a novel combination of supervised Machine Learning with Digital Signal Processing, resulting in ML-DSP: an alignment-free software tool for ultrafast, accurate, and scalable genome classification at all taxonomic levels. We test ML-DSP by classifying 7396 full mitochondrial genomes at various taxonomic levels, from kingdom to genus, with an average classification accuracy of >97%. A quantitative comparison with state-of-the-art classification software tools is performed, on two small benchmark datasets and one large 4322 vertebrate mtDNA genomes dataset. Our results show that ML-DSP overwhelmingly outperforms the alignment-based software MEGA7 (alignment with MUSCLE or CLUSTALW) in terms of processing time, while having comparable classification accuracies for small datasets and superior accuracies for the large dataset. Compared with the alignment-free software FFP (Feature Frequency Profile), ML-DSP has significantly better classification accuracy, and is overall faster. We also provide preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4271 complete dengue virus genomes into subtypes with 100% accuracy, and 4,710 bacterial genomes into phyla with 95.5% accuracy. Lastly, our analysis shows that the "Purine/Pyrimidine", "Just-A" and "Real" numerical representations of DNA sequences outperform ten other such numerical representations used in the Digital Signal Processing literature for DNA classification purposes. CONCLUSIONS: Due to its superior classification accuracy, speed, and scalability to large datasets, ML-DSP is highly relevant in the classification of newly discovered organisms, in distinguishing genomic signatures and identifying their mechanistic determinants, and in evaluating genome integrity.
61 schema:genre research_article
62 schema:inLanguage en
63 schema:isAccessibleForFree true
64 schema:isPartOf N7114a2496f9b40899ace95298c42337c
65 Nd647aca3c49642e88084e1a245a5ee71
66 sg:journal.1023790
67 schema:name ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.
68 schema:pagination 267
69 schema:productId N01fabc9d99544a059aa7abc46371ce85
70 N769e222936324b9cbc8449ef1eccf948
71 Nb20bc9e00f094bf9b01c7d279123865c
72 Ne84f1ea7ce8f45d3a03ecb098fc7e13d
73 schema:sameAs https://app.dimensions.ai/details/publication/pub.1113181741
74 https://doi.org/10.1186/s12864-019-5571-y
75 schema:sdDatePublished 2019-04-15T09:01
76 schema:sdLicense https://scigraph.springernature.com/explorer/license/
77 schema:sdPublisher N07eef1d489434174a5b03f0f1571a8f2
78 schema:url https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-5571-y
79 sgo:license sg:explorer/license/
80 sgo:sdDataset articles
81 rdf:type schema:ScholarlyArticle
82 N01fabc9d99544a059aa7abc46371ce85 schema:name doi
83 schema:value 10.1186/s12864-019-5571-y
84 rdf:type schema:PropertyValue
85 N07eef1d489434174a5b03f0f1571a8f2 schema:name Springer Nature - SN SciGraph project
86 rdf:type schema:Organization
87 N1a2ffd075dae49e281b964fcacd22598 rdf:first N9bfb44179ef84ed588aca6f2bf96503f
88 rdf:rest N3674e9effc61404ba62de86f432bf780
89 N3674e9effc61404ba62de86f432bf780 rdf:first N9724774a2a75434c930fd9f98655154f
90 rdf:rest rdf:nil
91 N7114a2496f9b40899ace95298c42337c schema:issueNumber 1
92 rdf:type schema:PublicationIssue
93 N769e222936324b9cbc8449ef1eccf948 schema:name nlm_unique_id
94 schema:value 100965258
95 rdf:type schema:PropertyValue
96 N9724774a2a75434c930fd9f98655154f schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
97 schema:familyName Kari
98 schema:givenName Lila
99 rdf:type schema:Person
100 N9bfb44179ef84ed588aca6f2bf96503f schema:affiliation https://www.grid.ac/institutes/grid.39381.30
101 schema:familyName Hill
102 schema:givenName Kathleen A
103 rdf:type schema:Person
104 Nb20bc9e00f094bf9b01c7d279123865c schema:name dimensions_id
105 schema:value pub.1113181741
106 rdf:type schema:PropertyValue
107 Nc090bdfc692f4cf49f6cb7d66c773bef rdf:first Nfd8632ba27de4c0ea10b3d5c59446da8
108 rdf:rest N1a2ffd075dae49e281b964fcacd22598
109 Nd647aca3c49642e88084e1a245a5ee71 schema:volumeNumber 20
110 rdf:type schema:PublicationVolume
111 Ne84f1ea7ce8f45d3a03ecb098fc7e13d schema:name pubmed_id
112 schema:value 30943897
113 rdf:type schema:PropertyValue
114 Nfd8632ba27de4c0ea10b3d5c59446da8 schema:affiliation https://www.grid.ac/institutes/grid.39381.30
115 schema:familyName Randhawa
116 schema:givenName Gurjit S
117 rdf:type schema:Person
118 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
119 schema:name Biological Sciences
120 rdf:type schema:DefinedTerm
121 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
122 schema:name Genetics
123 rdf:type schema:DefinedTerm
124 sg:journal.1023790 schema:issn 1471-2164
125 schema:name BMC Genomics
126 rdf:type schema:Periodical
127 sg:pub.10.1007/3-540-47961-9_27 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008406570
128 https://doi.org/10.1007/3-540-47961-9_27
129 rdf:type schema:CreativeWork
130 sg:pub.10.1007/978-3-319-27400-3_25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018946422
131 https://doi.org/10.1007/978-3-319-27400-3_25
132 rdf:type schema:CreativeWork
133 sg:pub.10.1007/978-3-319-33618-3_25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004984300
134 https://doi.org/10.1007/978-3-319-33618-3_25
135 rdf:type schema:CreativeWork
136 sg:pub.10.1038/s41598-016-0028-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1079403223
137 https://doi.org/10.1038/s41598-016-0028-x
138 rdf:type schema:CreativeWork
139 sg:pub.10.1186/1471-2105-14-s10-s1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009261916
140 https://doi.org/10.1186/1471-2105-14-s10-s1
141 rdf:type schema:CreativeWork
142 sg:pub.10.1186/1471-2105-15-321 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019437002
143 https://doi.org/10.1186/1471-2105-15-321
144 rdf:type schema:CreativeWork
145 sg:pub.10.1186/s12859-017-1602-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1084740427
146 https://doi.org/10.1186/s12859-017-1602-3
147 rdf:type schema:CreativeWork
148 sg:pub.10.1186/s13040-015-0073-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001343002
149 https://doi.org/10.1186/s13040-015-0073-1
150 rdf:type schema:CreativeWork
151 sg:pub.10.1186/s13040-016-0116-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017250924
152 https://doi.org/10.1186/s13040-016-0116-2
153 rdf:type schema:CreativeWork
154 sg:pub.10.1186/s13059-017-1319-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092065837
155 https://doi.org/10.1186/s13059-017-1319-7
156 rdf:type schema:CreativeWork
157 https://doi.org/10.1002/jcc.20922 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021253586
158 rdf:type schema:CreativeWork
159 https://doi.org/10.1016/j.bse.2016.07.012 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039787609
160 rdf:type schema:CreativeWork
161 https://doi.org/10.1016/j.cogsys.2018.01.006 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100860857
162 rdf:type schema:CreativeWork
163 https://doi.org/10.1016/j.compbiomed.2015.05.022 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007088740
164 rdf:type schema:CreativeWork
165 https://doi.org/10.1016/j.ins.2014.04.029 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045237460
166 rdf:type schema:CreativeWork
167 https://doi.org/10.1016/j.jtbi.2011.01.038 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038725900
168 rdf:type schema:CreativeWork
169 https://doi.org/10.1016/j.jtbi.2014.05.043 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002252738
170 rdf:type schema:CreativeWork
171 https://doi.org/10.1016/j.jtbi.2015.02.026 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046824945
172 rdf:type schema:CreativeWork
173 https://doi.org/10.1016/j.jtbi.2015.06.033 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028628380
174 rdf:type schema:CreativeWork
175 https://doi.org/10.1016/j.jtbi.2015.08.007 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039416914
176 rdf:type schema:CreativeWork
177 https://doi.org/10.1016/j.neucom.2016.09.077 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038640165
178 rdf:type schema:CreativeWork
179 https://doi.org/10.1016/j.ygeno.2016.08.002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013649729
180 rdf:type schema:CreativeWork
181 https://doi.org/10.1016/s0165-1684(02)00477-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035477892
182 rdf:type schema:CreativeWork
183 https://doi.org/10.1073/pnas.0813249106 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017699788
184 rdf:type schema:CreativeWork
185 https://doi.org/10.1080/10408340500526766 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019373831
186 rdf:type schema:CreativeWork
187 https://doi.org/10.1093/bib/bbt052 schema:sameAs https://app.dimensions.ai/details/publication/pub.1059413022
188 rdf:type schema:CreativeWork
189 https://doi.org/10.1093/bib/bbt067 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007818694
190 rdf:type schema:CreativeWork
191 https://doi.org/10.1093/bib/bbt068 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018549458
192 rdf:type schema:CreativeWork
193 https://doi.org/10.1093/bib/bbt070 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013026493
194 rdf:type schema:CreativeWork
195 https://doi.org/10.1093/bib/bbt072 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009442051
196 rdf:type schema:CreativeWork
197 https://doi.org/10.1093/bioinformatics/17.5.429 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009721349
198 rdf:type schema:CreativeWork
199 https://doi.org/10.1093/bioinformatics/btg005 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043080454
200 rdf:type schema:CreativeWork
201 https://doi.org/10.1093/bioinformatics/bti607 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005222175
202 rdf:type schema:CreativeWork
203 https://doi.org/10.1093/bioinformatics/btm404 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007683223
204 rdf:type schema:CreativeWork
205 https://doi.org/10.1093/bioinformatics/btu177 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029212748
206 rdf:type schema:CreativeWork
207 https://doi.org/10.1093/bioinformatics/btx367 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085899231
208 rdf:type schema:CreativeWork
209 https://doi.org/10.1093/molbev/msw054 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012564990
210 rdf:type schema:CreativeWork
211 https://doi.org/10.1093/nar/22.22.4673 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042438223
212 rdf:type schema:CreativeWork
213 https://doi.org/10.1093/nar/gkh340 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025846396
214 rdf:type schema:CreativeWork
215 https://doi.org/10.1093/nar/gku739 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013715418
216 rdf:type schema:CreativeWork
217 https://doi.org/10.1109/apsipa.2017.8282195 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100942749
218 rdf:type schema:CreativeWork
219 https://doi.org/10.1109/eit.2009.5189632 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093255351
220 rdf:type schema:CreativeWork
221 https://doi.org/10.1111/j.1582-4934.2002.tb00196.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1019512218
222 rdf:type schema:CreativeWork
223 https://doi.org/10.1371/journal.pbio.1001127 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039826087
224 rdf:type schema:CreativeWork
225 https://doi.org/10.1371/journal.pbio.1001130 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034537579
226 rdf:type schema:CreativeWork
227 https://doi.org/10.1371/journal.pcbi.1000581 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005613156
228 rdf:type schema:CreativeWork
229 https://doi.org/10.1371/journal.pone.0110954 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045535857
230 rdf:type schema:CreativeWork
231 https://doi.org/10.1371/journal.pone.0119815 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017078210
232 rdf:type schema:CreativeWork
233 https://doi.org/10.1371/journal.pone.0173288 schema:sameAs https://app.dimensions.ai/details/publication/pub.1084296733
234 rdf:type schema:CreativeWork
235 https://doi.org/10.1371/journal.pone.0206409 schema:sameAs https://app.dimensions.ai/details/publication/pub.1109897754
236 rdf:type schema:CreativeWork
237 https://doi.org/10.2174/157489309787158134 schema:sameAs https://app.dimensions.ai/details/publication/pub.1069217454
238 rdf:type schema:CreativeWork
239 https://doi.org/10.4137/ebo.s7364 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044611911
240 rdf:type schema:CreativeWork
241 https://doi.org/10.5815/ijitcs.2012.08.03 schema:sameAs https://app.dimensions.ai/details/publication/pub.1073149925
242 rdf:type schema:CreativeWork
243 https://doi.org/10.6026/97320630004463 schema:sameAs https://app.dimensions.ai/details/publication/pub.1073594092
244 rdf:type schema:CreativeWork
245 https://www.grid.ac/institutes/grid.39381.30 schema:alternateName Western University
246 schema:name Department of Biology, University of Western Ontario, London, ON, Canada.
247 Department of Computer Science, University of Western Ontario, London, ON, Canada. grandha8@uwo.ca.
248 rdf:type schema:Organization
249 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
250 schema:name School of Computer Science, University of Waterloo, Waterloo, ON, Canada.
251 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...