Comparative study of singing voice detection based on deep neural networks and ensemble learning View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-12

AUTHORS

Shingchern D. You, Chien-Hung Liu, Woei-Kae Chen

ABSTRACT

This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features to the network models are MFCC (mel-frequency cepstrum coefficients), spectrogram from short-time Fourier transformation, or raw PCM samples. The simulation results show that CNN model with spectrogram inputs yields higher detection accuracy, up to 91.8% for Jamendo dataset. Among the studied stacked ensemble methods, performing voting strategy yields comparable performance as the other methods, but with much lower computational cost. By voting with five models, the accuracy reaches 94.2% for Jamendo dataset. More... »

PAGES

34

References to SciGraph publications

  • 2017-12. Image recognition performance enhancements using image normalization in HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES
  • 2016-12. Comparative study of singing voice detection methods in MULTIMEDIA TOOLS AND APPLICATIONS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1

    DOI

    http://dx.doi.org/10.1186/s13673-018-0158-1

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1109976832


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "National Taipei University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.412087.8", 
              "name": [
                "Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "You", 
            "givenName": "Shingchern D.", 
            "id": "sg:person.0747700074.22", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0747700074.22"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Taipei University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.412087.8", 
              "name": [
                "Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Liu", 
            "givenName": "Chien-Hung", 
            "id": "sg:person.011522042751.32", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522042751.32"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Taipei University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.412087.8", 
              "name": [
                "Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chen", 
            "givenName": "Woei-Kae", 
            "id": "sg:person.01064126474.18", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01064126474.18"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/s11042-015-2894-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008727015", 
              "https://doi.org/10.1007/s11042-015-2894-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/1027527.1027602", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047045082"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/msp.2013.2271648", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061424004"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tasl.2011.2182510", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061516885"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tassp.1980.1163420", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061518701"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3745/jips.04.0029", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085350041"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13673-017-0114-5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1091668071", 
              "https://doi.org/10.1186/s13673-017-0114-5"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2014.6854950", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1093627962"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2015.7177944", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1093812481"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2008.4518002", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094176642"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2004.1327263", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094282312"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icassp.2000.859068", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094311318"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/iscslp.2016.7918369", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094642060"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/aspaa.2001.969557", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095596766"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3390/app8010150", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100584413"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/isce.2017.8355533", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1103855356"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/isne.2018.8394727", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1105153192"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3837/tiis.2018.06.017", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1105529237"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-12", 
        "datePublishedReg": "2018-12-01", 
        "description": "This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features to the network models are MFCC (mel-frequency cepstrum coefficients), spectrogram from short-time Fourier transformation, or raw PCM samples. The simulation results show that CNN model with spectrogram inputs yields higher detection accuracy, up to 91.8% for Jamendo dataset. Among the studied stacked ensemble methods, performing voting strategy yields comparable performance as the other methods, but with much lower computational cost. By voting with five models, the accuracy reaches 94.2% for Jamendo dataset.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1186/s13673-018-0158-1", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1136381", 
            "issn": [
              "2192-1962", 
              "2192-1962"
            ], 
            "name": "Human-centric Computing and Information Sciences", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "8"
          }
        ], 
        "name": "Comparative study of singing voice detection based on deep neural networks and ensemble learning", 
        "pagination": "34", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "4b148490c8a5dc9358cc02ff8ca290b8cb439ac266e88d7c51dd30067e3d055a"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s13673-018-0158-1"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1109976832"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s13673-018-0158-1", 
          "https://app.dimensions.ai/details/publication/pub.1109976832"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T08:13", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000273_0000000273/records_93980_00000000.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://link.springer.com/10.1186%2Fs13673-018-0158-1"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13673-018-0158-1'


     

    This table displays all metadata directly associated to this object as RDF triples.

    130 TRIPLES      21 PREDICATES      45 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s13673-018-0158-1 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author N8bac4809d95c4476b08e3e99da8944e6
    4 schema:citation sg:pub.10.1007/s11042-015-2894-9
    5 sg:pub.10.1186/s13673-017-0114-5
    6 https://doi.org/10.1109/aspaa.2001.969557
    7 https://doi.org/10.1109/icassp.2000.859068
    8 https://doi.org/10.1109/icassp.2004.1327263
    9 https://doi.org/10.1109/icassp.2008.4518002
    10 https://doi.org/10.1109/icassp.2014.6854950
    11 https://doi.org/10.1109/icassp.2015.7177944
    12 https://doi.org/10.1109/isce.2017.8355533
    13 https://doi.org/10.1109/iscslp.2016.7918369
    14 https://doi.org/10.1109/isne.2018.8394727
    15 https://doi.org/10.1109/msp.2013.2271648
    16 https://doi.org/10.1109/tasl.2011.2182510
    17 https://doi.org/10.1109/tassp.1980.1163420
    18 https://doi.org/10.1145/1027527.1027602
    19 https://doi.org/10.3390/app8010150
    20 https://doi.org/10.3745/jips.04.0029
    21 https://doi.org/10.3837/tiis.2018.06.017
    22 schema:datePublished 2018-12
    23 schema:datePublishedReg 2018-12-01
    24 schema:description This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features to the network models are MFCC (mel-frequency cepstrum coefficients), spectrogram from short-time Fourier transformation, or raw PCM samples. The simulation results show that CNN model with spectrogram inputs yields higher detection accuracy, up to 91.8% for Jamendo dataset. Among the studied stacked ensemble methods, performing voting strategy yields comparable performance as the other methods, but with much lower computational cost. By voting with five models, the accuracy reaches 94.2% for Jamendo dataset.
    25 schema:genre research_article
    26 schema:inLanguage en
    27 schema:isAccessibleForFree true
    28 schema:isPartOf N59127ebfc76c4abcae1223f6186c683a
    29 N7077fc45564347799815a09ee0417dc6
    30 sg:journal.1136381
    31 schema:name Comparative study of singing voice detection based on deep neural networks and ensemble learning
    32 schema:pagination 34
    33 schema:productId N0e3f208ce8aa4dcfa8d7e60afcac0842
    34 N24e5199c57cd468a9505b6d70516ac15
    35 Na8e840b708de46ce9dcc848b01d0a7fc
    36 schema:sameAs https://app.dimensions.ai/details/publication/pub.1109976832
    37 https://doi.org/10.1186/s13673-018-0158-1
    38 schema:sdDatePublished 2019-04-11T08:13
    39 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    40 schema:sdPublisher N099b0d6381f442cb9b0a815550e9bafa
    41 schema:url https://link.springer.com/10.1186%2Fs13673-018-0158-1
    42 sgo:license sg:explorer/license/
    43 sgo:sdDataset articles
    44 rdf:type schema:ScholarlyArticle
    45 N099b0d6381f442cb9b0a815550e9bafa schema:name Springer Nature - SN SciGraph project
    46 rdf:type schema:Organization
    47 N0e3f208ce8aa4dcfa8d7e60afcac0842 schema:name doi
    48 schema:value 10.1186/s13673-018-0158-1
    49 rdf:type schema:PropertyValue
    50 N24e5199c57cd468a9505b6d70516ac15 schema:name readcube_id
    51 schema:value 4b148490c8a5dc9358cc02ff8ca290b8cb439ac266e88d7c51dd30067e3d055a
    52 rdf:type schema:PropertyValue
    53 N52573309a16c4033b99813377f189d7b rdf:first sg:person.01064126474.18
    54 rdf:rest rdf:nil
    55 N557e11cf8bf443f3a9bf8da4843cce58 rdf:first sg:person.011522042751.32
    56 rdf:rest N52573309a16c4033b99813377f189d7b
    57 N59127ebfc76c4abcae1223f6186c683a schema:volumeNumber 8
    58 rdf:type schema:PublicationVolume
    59 N7077fc45564347799815a09ee0417dc6 schema:issueNumber 1
    60 rdf:type schema:PublicationIssue
    61 N8bac4809d95c4476b08e3e99da8944e6 rdf:first sg:person.0747700074.22
    62 rdf:rest N557e11cf8bf443f3a9bf8da4843cce58
    63 Na8e840b708de46ce9dcc848b01d0a7fc schema:name dimensions_id
    64 schema:value pub.1109976832
    65 rdf:type schema:PropertyValue
    66 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    67 schema:name Information and Computing Sciences
    68 rdf:type schema:DefinedTerm
    69 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    70 schema:name Artificial Intelligence and Image Processing
    71 rdf:type schema:DefinedTerm
    72 sg:journal.1136381 schema:issn 2192-1962
    73 schema:name Human-centric Computing and Information Sciences
    74 rdf:type schema:Periodical
    75 sg:person.01064126474.18 schema:affiliation https://www.grid.ac/institutes/grid.412087.8
    76 schema:familyName Chen
    77 schema:givenName Woei-Kae
    78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01064126474.18
    79 rdf:type schema:Person
    80 sg:person.011522042751.32 schema:affiliation https://www.grid.ac/institutes/grid.412087.8
    81 schema:familyName Liu
    82 schema:givenName Chien-Hung
    83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522042751.32
    84 rdf:type schema:Person
    85 sg:person.0747700074.22 schema:affiliation https://www.grid.ac/institutes/grid.412087.8
    86 schema:familyName You
    87 schema:givenName Shingchern D.
    88 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0747700074.22
    89 rdf:type schema:Person
    90 sg:pub.10.1007/s11042-015-2894-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008727015
    91 https://doi.org/10.1007/s11042-015-2894-9
    92 rdf:type schema:CreativeWork
    93 sg:pub.10.1186/s13673-017-0114-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091668071
    94 https://doi.org/10.1186/s13673-017-0114-5
    95 rdf:type schema:CreativeWork
    96 https://doi.org/10.1109/aspaa.2001.969557 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095596766
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.1109/icassp.2000.859068 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094311318
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.1109/icassp.2004.1327263 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094282312
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.1109/icassp.2008.4518002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094176642
    103 rdf:type schema:CreativeWork
    104 https://doi.org/10.1109/icassp.2014.6854950 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093627962
    105 rdf:type schema:CreativeWork
    106 https://doi.org/10.1109/icassp.2015.7177944 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093812481
    107 rdf:type schema:CreativeWork
    108 https://doi.org/10.1109/isce.2017.8355533 schema:sameAs https://app.dimensions.ai/details/publication/pub.1103855356
    109 rdf:type schema:CreativeWork
    110 https://doi.org/10.1109/iscslp.2016.7918369 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094642060
    111 rdf:type schema:CreativeWork
    112 https://doi.org/10.1109/isne.2018.8394727 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105153192
    113 rdf:type schema:CreativeWork
    114 https://doi.org/10.1109/msp.2013.2271648 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061424004
    115 rdf:type schema:CreativeWork
    116 https://doi.org/10.1109/tasl.2011.2182510 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061516885
    117 rdf:type schema:CreativeWork
    118 https://doi.org/10.1109/tassp.1980.1163420 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061518701
    119 rdf:type schema:CreativeWork
    120 https://doi.org/10.1145/1027527.1027602 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047045082
    121 rdf:type schema:CreativeWork
    122 https://doi.org/10.3390/app8010150 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100584413
    123 rdf:type schema:CreativeWork
    124 https://doi.org/10.3745/jips.04.0029 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085350041
    125 rdf:type schema:CreativeWork
    126 https://doi.org/10.3837/tiis.2018.06.017 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105529237
    127 rdf:type schema:CreativeWork
    128 https://www.grid.ac/institutes/grid.412087.8 schema:alternateName National Taipei University of Technology
    129 schema:name Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan
    130 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...