Comparative study of singing voice detection based on deep neural networks and ensemble learning View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-12

AUTHORS

Shingchern D. You, Chien-Hung Liu, Woei-Kae Chen

ABSTRACT

This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features to the network models are MFCC (mel-frequency cepstrum coefficients), spectrogram from short-time Fourier transformation, or raw PCM samples. The simulation results show that CNN model with spectrogram inputs yields higher detection accuracy, up to 91.8% for Jamendo dataset. Among the studied stacked ensemble methods, performing voting strategy yields comparable performance as the other methods, but with much lower computational cost. By voting with five models, the accuracy reaches 94.2% for Jamendo dataset. More... »

PAGES

34

References to SciGraph publications

  • 2017-12. Image recognition performance enhancements using image normalization in HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES
  • 2016-12. Comparative study of singing voice detection methods in MULTIMEDIA TOOLS AND APPLICATIONS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1

    DOI

    http://dx.doi.org/10.1186/s13673-018-0158-1

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1109976832


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "National Taipei University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.412087.8", 
              "name": [
                "Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "You", 
            "givenName": "Shingchern D.", 
            "id": "sg:person.0747700074.22", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0747700074.22"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Taipei University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.412087.8", 
              "name": [
                "Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Liu", 
            "givenName": "Chien-Hung", 
            "id": "sg:person.011522042751.32", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522042751.32"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Taipei University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.412087.8", 
              "name": [
                "Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chen", 
            "givenName": "Woei-Kae", 
            "id": "sg:person.01064126474.18", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01064126474.18"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/s11042-015-2894-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008727015", 
              "https://doi.org/10.1007/s11042-015-2894-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/1027527.1027602", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047045082"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/msp.2013.2271648", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061424004"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tasl.2011.2182510", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061516885"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tassp.1980.1163420", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061518701"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3745/jips.04.0029", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085350041"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13673-017-0114-5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1091668071", 
              "https://doi.org/10.1186/s13673-017-0114-5"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2014.6854950", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1093627962"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2015.7177944", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1093812481"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2008.4518002", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094176642"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2004.1327263", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094282312"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2000.859068", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094311318"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/iscslp.2016.7918369", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094642060"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/aspaa.2001.969557", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095596766"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3390/app8010150", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100584413"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/isce.2017.8355533", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1103855356"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/isne.2018.8394727", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1105153192"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3837/tiis.2018.06.017", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1105529237"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-12", 
        "datePublishedReg": "2018-12-01", 
        "description": "This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features to the network models are MFCC (mel-frequency cepstrum coefficients), spectrogram from short-time Fourier transformation, or raw PCM samples. The simulation results show that CNN model with spectrogram inputs yields higher detection accuracy, up to 91.8% for Jamendo dataset. Among the studied stacked ensemble methods, performing voting strategy yields comparable performance as the other methods, but with much lower computational cost. By voting with five models, the accuracy reaches 94.2% for Jamendo dataset.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1186/s13673-018-0158-1", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1136381", 
            "issn": [
              "2192-1962", 
              "2192-1962"
            ], 
            "name": "Human-centric Computing and Information Sciences", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "8"
          }
        ], 
        "name": "Comparative study of singing voice detection based on deep neural networks and ensemble learning", 
        "pagination": "34", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "4b148490c8a5dc9358cc02ff8ca290b8cb439ac266e88d7c51dd30067e3d055a"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s13673-018-0158-1"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1109976832"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s13673-018-0158-1", 
          "https://app.dimensions.ai/details/publication/pub.1109976832"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T08:13", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000273_0000000273/records_93980_00000000.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://link.springer.com/10.1186%2Fs13673-018-0158-1"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1'


     

    This table displays all metadata directly associated to this object as RDF triples.

    130 TRIPLES      21 PREDICATES      45 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s13673-018-0158-1 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author Nb2c1484d707343abb474f374727f772a
    4 schema:citation sg:pub.10.1007/s11042-015-2894-9
    5 sg:pub.10.1186/s13673-017-0114-5
    6 https://doi.org/10.1109/aspaa.2001.969557
    7 https://doi.org/10.1109/icassp.2000.859068
    8 https://doi.org/10.1109/icassp.2004.1327263
    9 https://doi.org/10.1109/icassp.2008.4518002
    10 https://doi.org/10.1109/icassp.2014.6854950
    11 https://doi.org/10.1109/icassp.2015.7177944
    12 https://doi.org/10.1109/isce.2017.8355533
    13 https://doi.org/10.1109/iscslp.2016.7918369
    14 https://doi.org/10.1109/isne.2018.8394727
    15 https://doi.org/10.1109/msp.2013.2271648
    16 https://doi.org/10.1109/tasl.2011.2182510
    17 https://doi.org/10.1109/tassp.1980.1163420
    18 https://doi.org/10.1145/1027527.1027602
    19 https://doi.org/10.3390/app8010150
    20 https://doi.org/10.3745/jips.04.0029
    21 https://doi.org/10.3837/tiis.2018.06.017
    22 schema:datePublished 2018-12
    23 schema:datePublishedReg 2018-12-01
    24 schema:description This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features to the network models are MFCC (mel-frequency cepstrum coefficients), spectrogram from short-time Fourier transformation, or raw PCM samples. The simulation results show that CNN model with spectrogram inputs yields higher detection accuracy, up to 91.8% for Jamendo dataset. Among the studied stacked ensemble methods, performing voting strategy yields comparable performance as the other methods, but with much lower computational cost. By voting with five models, the accuracy reaches 94.2% for Jamendo dataset.
    25 schema:genre research_article
    26 schema:inLanguage en
    27 schema:isAccessibleForFree true
    28 schema:isPartOf Nca5f2d0daf0c40bba07581d5b65c8f61
    29 Nd673bb7d941a41b29242360e522d81b9
    30 sg:journal.1136381
    31 schema:name Comparative study of singing voice detection based on deep neural networks and ensemble learning
    32 schema:pagination 34
    33 schema:productId N174f99658b2141efbc11ca898dcc4804
    34 N6489bfce826547a98528c2c33cea22a3
    35 N7c8b28cfd3e8445badebf8831632ccd1
    36 schema:sameAs https://app.dimensions.ai/details/publication/pub.1109976832
    37 https://doi.org/10.1186/s13673-018-0158-1
    38 schema:sdDatePublished 2019-04-11T08:13
    39 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    40 schema:sdPublisher N71cfadc662c34b17aa1fdcf7425916be
    41 schema:url https://link.springer.com/10.1186%2Fs13673-018-0158-1
    42 sgo:license sg:explorer/license/
    43 sgo:sdDataset articles
    44 rdf:type schema:ScholarlyArticle
    45 N174f99658b2141efbc11ca898dcc4804 schema:name readcube_id
    46 schema:value 4b148490c8a5dc9358cc02ff8ca290b8cb439ac266e88d7c51dd30067e3d055a
    47 rdf:type schema:PropertyValue
    48 N6489bfce826547a98528c2c33cea22a3 schema:name dimensions_id
    49 schema:value pub.1109976832
    50 rdf:type schema:PropertyValue
    51 N71cfadc662c34b17aa1fdcf7425916be schema:name Springer Nature - SN SciGraph project
    52 rdf:type schema:Organization
    53 N79d209fe77f74c5ab6b22582c1cbc83f rdf:first sg:person.011522042751.32
    54 rdf:rest N9d1076cb47a3421f83a28d974efd0ec3
    55 N7c8b28cfd3e8445badebf8831632ccd1 schema:name doi
    56 schema:value 10.1186/s13673-018-0158-1
    57 rdf:type schema:PropertyValue
    58 N9d1076cb47a3421f83a28d974efd0ec3 rdf:first sg:person.01064126474.18
    59 rdf:rest rdf:nil
    60 Nb2c1484d707343abb474f374727f772a rdf:first sg:person.0747700074.22
    61 rdf:rest N79d209fe77f74c5ab6b22582c1cbc83f
    62 Nca5f2d0daf0c40bba07581d5b65c8f61 schema:issueNumber 1
    63 rdf:type schema:PublicationIssue
    64 Nd673bb7d941a41b29242360e522d81b9 schema:volumeNumber 8
    65 rdf:type schema:PublicationVolume
    66 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    67 schema:name Information and Computing Sciences
    68 rdf:type schema:DefinedTerm
    69 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    70 schema:name Artificial Intelligence and Image Processing
    71 rdf:type schema:DefinedTerm
    72 sg:journal.1136381 schema:issn 2192-1962
    73 schema:name Human-centric Computing and Information Sciences
    74 rdf:type schema:Periodical
    75 sg:person.01064126474.18 schema:affiliation https://www.grid.ac/institutes/grid.412087.8
    76 schema:familyName Chen
    77 schema:givenName Woei-Kae
    78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01064126474.18
    79 rdf:type schema:Person
    80 sg:person.011522042751.32 schema:affiliation https://www.grid.ac/institutes/grid.412087.8
    81 schema:familyName Liu
    82 schema:givenName Chien-Hung
    83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522042751.32
    84 rdf:type schema:Person
    85 sg:person.0747700074.22 schema:affiliation https://www.grid.ac/institutes/grid.412087.8
    86 schema:familyName You
    87 schema:givenName Shingchern D.
    88 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0747700074.22
    89 rdf:type schema:Person
    90 sg:pub.10.1007/s11042-015-2894-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008727015
    91 https://doi.org/10.1007/s11042-015-2894-9
    92 rdf:type schema:CreativeWork
    93 sg:pub.10.1186/s13673-017-0114-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091668071
    94 https://doi.org/10.1186/s13673-017-0114-5
    95 rdf:type schema:CreativeWork
    96 https://doi.org/10.1109/aspaa.2001.969557 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095596766
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.1109/icassp.2000.859068 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094311318
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.1109/icassp.2004.1327263 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094282312
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.1109/icassp.2008.4518002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094176642
    103 rdf:type schema:CreativeWork
    104 https://doi.org/10.1109/icassp.2014.6854950 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093627962
    105 rdf:type schema:CreativeWork
    106 https://doi.org/10.1109/icassp.2015.7177944 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093812481
    107 rdf:type schema:CreativeWork
    108 https://doi.org/10.1109/isce.2017.8355533 schema:sameAs https://app.dimensions.ai/details/publication/pub.1103855356
    109 rdf:type schema:CreativeWork
    110 https://doi.org/10.1109/iscslp.2016.7918369 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094642060
    111 rdf:type schema:CreativeWork
    112 https://doi.org/10.1109/isne.2018.8394727 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105153192
    113 rdf:type schema:CreativeWork
    114 https://doi.org/10.1109/msp.2013.2271648 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061424004
    115 rdf:type schema:CreativeWork
    116 https://doi.org/10.1109/tasl.2011.2182510 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061516885
    117 rdf:type schema:CreativeWork
    118 https://doi.org/10.1109/tassp.1980.1163420 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061518701
    119 rdf:type schema:CreativeWork
    120 https://doi.org/10.1145/1027527.1027602 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047045082
    121 rdf:type schema:CreativeWork
    122 https://doi.org/10.3390/app8010150 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100584413
    123 rdf:type schema:CreativeWork
    124 https://doi.org/10.3745/jips.04.0029 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085350041
    125 rdf:type schema:CreativeWork
    126 https://doi.org/10.3837/tiis.2018.06.017 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105529237
    127 rdf:type schema:CreativeWork
    128 https://www.grid.ac/institutes/grid.412087.8 schema:alternateName National Taipei University of Technology
    129 schema:name Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan
    130 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...