YEARS

2013-2015

AUTHORS

Richard Eric Turner

TITLE

Unifying audio signal processing and machine learning: a fundamental framework for machine hearing

ABSTRACT

Modern technology is leading to a flood of audio data. For example, over seventy two hours of unstructured and unlabelled sound-tracks are uploaded to internet sites every minute. Automatic systems are urgently needed for recognising audio content so that these sound-tracks can be tagged for categorisation and search. Moreover, an increasing proportion of recordings are made on hand-held devices in challenging environments that contain multiple sound sources and noise. Such uncurated and noisy data necessitate automatic systems for cleaning the audio content and separating sources from mixtures. On a related note, devices for the hearing impaired currently perform poorly in noise. In fact, this is a major reason why six million people in the UK who would benefit from a hearing aid, do not use them (a market worth £18 billion p.a.). Patients fitted with cochlear implants suffer from similar limitations, and as the population ages more people are affected. It is clear that audio recognition and enhancement methods are required to stop us drowning in audio-data, for processing in hearing devices, and to support new technological innovations. Current approaches to these problems use a combination of audio signal processing (which places the audio data into a convenient format and reduces the data-rate) and machine learning (which removes noise, separates sources, or classifies the content). It is widely believed that these two fields must become increasingly integrated in the future. However, this union is currently a troubled one, suffering from four problems. Inefficiency: The methods are too inefficient when we have vast amounts of data (as is the case for audio-tracks on the web) or for real-time applications (such as is necessary in hearing aids) Impoverished models: The machine learning modules tend to be statistically limited. Unadapted: The signal processing modules are unadapted despite evidence from other fields, like computer vision, which suggests thatautomatic tuning leads to significant performance gains Distorted mixtures: The signal processing modules introduce non-linear distortions which are not captured by the machine learning modules. In this project we address these four limitations by introducing a new theoretical framework which unifies signal processing and machine learning. The key step is to view the signal processing module as solving an inference problem. Since the machine-learning modules are often framed in this way, the two modules can be integrated into a single coherent approach allowing technologies from the two fields to be completely integrated. In the project we will then use the new approach to develop efficient, rich, adaptive, and distortion free approaches to audio denoising, source separation and recognition. We will evaluate the the noise reduction and source separations algorithms on the hearing impaired, and the audio recognition algorithms on audio-sound track data. We believe this new framework will form a foundation of the emerging field of machine hearing. In the future, machine hearing will be deployed in a vast range of applications from music processing tasks to augmented reality systems (in conjunction with technologies from computer vision). We believe that this project will kick start this proliferation.

FUNDED PUBLICATIONS

  • articles:0e2a2e0ceb2a8348b8e06e2be4b0ae87
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    21 TRIPLES      17 PREDICATES      22 URIs      10 LITERALS

    Subject Predicate Object
    1 grants:c15e93e8bbf00c9b91ed3f24e4d2e4ea sg:abstract Modern technology is leading to a flood of audio data. For example, over seventy two hours of unstructured and unlabelled sound-tracks are uploaded to internet sites every minute. Automatic systems are urgently needed for recognising audio content so that these sound-tracks can be tagged for categorisation and search. Moreover, an increasing proportion of recordings are made on hand-held devices in challenging environments that contain multiple sound sources and noise. Such uncurated and noisy data necessitate automatic systems for cleaning the audio content and separating sources from mixtures. On a related note, devices for the hearing impaired currently perform poorly in noise. In fact, this is a major reason why six million people in the UK who would benefit from a hearing aid, do not use them (a market worth £18 billion p.a.). Patients fitted with cochlear implants suffer from similar limitations, and as the population ages more people are affected. It is clear that audio recognition and enhancement methods are required to stop us drowning in audio-data, for processing in hearing devices, and to support new technological innovations. Current approaches to these problems use a combination of audio signal processing (which places the audio data into a convenient format and reduces the data-rate) and machine learning (which removes noise, separates sources, or classifies the content). It is widely believed that these two fields must become increasingly integrated in the future. However, this union is currently a troubled one, suffering from four problems. Inefficiency: The methods are too inefficient when we have vast amounts of data (as is the case for audio-tracks on the web) or for real-time applications (such as is necessary in hearing aids) Impoverished models: The machine learning modules tend to be statistically limited. Unadapted: The signal processing modules are unadapted despite evidence from other fields, like computer vision, which suggests thatautomatic tuning leads to significant performance gains Distorted mixtures: The signal processing modules introduce non-linear distortions which are not captured by the machine learning modules. In this project we address these four limitations by introducing a new theoretical framework which unifies signal processing and machine learning. The key step is to view the signal processing module as solving an inference problem. Since the machine-learning modules are often framed in this way, the two modules can be integrated into a single coherent approach allowing technologies from the two fields to be completely integrated. In the project we will then use the new approach to develop efficient, rich, adaptive, and distortion free approaches to audio denoising, source separation and recognition. We will evaluate the the noise reduction and source separations algorithms on the hearing impaired, and the audio recognition algorithms on audio-sound track data. We believe this new framework will form a foundation of the emerging field of machine hearing. In the future, machine hearing will be deployed in a vast range of applications from music processing tasks to augmented reality systems (in conjunction with technologies from computer vision). We believe that this project will kick start this proliferation.
    2 sg:endYear 2015
    3 sg:fundingAmount 97101.0
    4 sg:fundingCurrency GBP
    5 sg:hasContribution contributions:dcba771da5416425afe8b5632839ec31
    6 sg:hasFieldOfResearchCode anzsrc-for:08
    7 anzsrc-for:0801
    8 anzsrc-for:09
    9 anzsrc-for:0906
    10 sg:hasFundedPublication articles:0e2a2e0ceb2a8348b8e06e2be4b0ae87
    11 sg:hasFundingOrganization grid-institutes:grid.421091.f
    12 sg:hasRecipientOrganization grid-institutes:grid.5335.0
    13 sg:language English
    14 sg:license http://scigraph.springernature.com/explorer/license/
    15 Contains UK public sector information licensed under the Open Government Licence v2.0 (http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/).
    16 sg:scigraphId c15e93e8bbf00c9b91ed3f24e4d2e4ea
    17 sg:startYear 2013
    18 sg:title Unifying audio signal processing and machine learning: a fundamental framework for machine hearing
    19 sg:webpage http://gtr.rcuk.ac.uk/project/C0A363B8-4B54-4A92-AAB4-1974BD58C1B1
    20 rdf:type sg:Grant
    21 rdfs:label Grant: Unifying audio signal processing and machine learning: a fundamental framework for machine hearing
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/c15e93e8bbf00c9b91ed3f24e4d2e4ea'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/c15e93e8bbf00c9b91ed3f24e4d2e4ea'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/c15e93e8bbf00c9b91ed3f24e4d2e4ea'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/c15e93e8bbf00c9b91ed3f24e4d2e4ea'






    Preview window. Press ESC to close (or click here)


    ...