Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2003-06-24

AUTHORS

Simon Lucey , Tsuhan Chen

ABSTRACT

In this paper an in depth analysis is undertaken into effective strategies for integrating the audio-visual modalities for the purposes of text-dependent speaker recognition. Our work is based around the well known hidden Markov model (HMM) classifier framework for modelling speech. A framework is proposed to handle the mismatch between train and test observation sets, so as to provide effective classifier combination performance between the acoustic and visual HMM classifiers. From this framework, it can be shown that strategies for combining independent classifiers, such as the weighted product or sum rules, naturally emerge depending on the influence of the mismatch. Based on the assumption that poor performance in most audio-visual speaker recognition applications can be attributed to train/test mismatches we propose that the main impetus of practical audio-visual integration is to dampen the independent errors, resulting from the mismatch, rather than trying to model any bimodal speech dependencies. To this end a strategy is recommended, based on theory and empirical evidence, using a hybrid between the weighted product and weighted sum rules in the presence of varying acoustic noise. Results are presented on the M2VTS database. More... »

PAGES

929-936

References to SciGraph publications

Book

TITLE

Audio- and Video-Based Biometric Person Authentication

ISBN

978-3-540-40302-9
978-3-540-44887-7

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-44887-x_108

DOI

http://dx.doi.org/10.1007/3-540-44887-x_108

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1045209544


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Carnegie Mellon University", 
          "id": "https://www.grid.ac/institutes/grid.147455.6", 
          "name": [
            "Advanced Multimedia Processing Laboratory, Department of Electrical and Computer Engineering, Carnegie Mellon University, 15213, Pittsburgh, PA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lucey", 
        "givenName": "Simon", 
        "id": "sg:person.0754071362.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0754071362.25"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Carnegie Mellon University", 
          "id": "https://www.grid.ac/institutes/grid.147455.6", 
          "name": [
            "Advanced Multimedia Processing Laboratory, Department of Electrical and Computer Engineering, Carnegie Mellon University, 15213, Pittsburgh, PA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chen", 
        "givenName": "Tsuhan", 
        "id": "sg:person.012245072625.31", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012245072625.31"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/bf01238023", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000725148", 
          "https://doi.org/10.1007/bf01238023"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0167-8655(97)00070-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025388281"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1155/s1110865703209045", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063207734", 
          "https://doi.org/10.1155/s1110865703209045"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2003-06-24", 
    "datePublishedReg": "2003-06-24", 
    "description": "In this paper an in depth analysis is undertaken into effective strategies for integrating the audio-visual modalities for the purposes of text-dependent speaker recognition. Our work is based around the well known hidden Markov model (HMM) classifier framework for modelling speech. A framework is proposed to handle the mismatch between train and test observation sets, so as to provide effective classifier combination performance between the acoustic and visual HMM classifiers. From this framework, it can be shown that strategies for combining independent classifiers, such as the weighted product or sum rules, naturally emerge depending on the influence of the mismatch. Based on the assumption that poor performance in most audio-visual speaker recognition applications can be attributed to train/test mismatches we propose that the main impetus of practical audio-visual integration is to dampen the independent errors, resulting from the mismatch, rather than trying to model any bimodal speech dependencies. To this end a strategy is recommended, based on theory and empirical evidence, using a hybrid between the weighted product and weighted sum rules in the presence of varying acoustic noise. Results are presented on the M2VTS database.", 
    "editor": [
      {
        "familyName": "Kittler", 
        "givenName": "Josef", 
        "type": "Person"
      }, 
      {
        "familyName": "Nixon", 
        "givenName": "Mark S.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-44887-x_108", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-40302-9", 
        "978-3-540-44887-7"
      ], 
      "name": "Audio- and Video-Based Biometric Person Authentication", 
      "type": "Book"
    }, 
    "name": "Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy", 
    "pagination": "929-936", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-44887-x_108"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "d7591d1bf1b07307cb77e5f045f072e28a6aee8edfa3180fc8dfb31fc93372a7"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1045209544"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-44887-x_108", 
      "https://app.dimensions.ai/details/publication/pub.1045209544"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T05:31", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000346_0000000346/records_99806_00000003.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F3-540-44887-X_108"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-44887-x_108'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-44887-x_108'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-44887-x_108'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-44887-x_108'


 

This table displays all metadata directly associated to this object as RDF triples.

88 TRIPLES      23 PREDICATES      29 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-44887-x_108 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Ndb97f7269cb3498597b9fe4703ec4125
4 schema:citation sg:pub.10.1007/bf01238023
5 sg:pub.10.1155/s1110865703209045
6 https://doi.org/10.1016/s0167-8655(97)00070-6
7 schema:datePublished 2003-06-24
8 schema:datePublishedReg 2003-06-24
9 schema:description In this paper an in depth analysis is undertaken into effective strategies for integrating the audio-visual modalities for the purposes of text-dependent speaker recognition. Our work is based around the well known hidden Markov model (HMM) classifier framework for modelling speech. A framework is proposed to handle the mismatch between train and test observation sets, so as to provide effective classifier combination performance between the acoustic and visual HMM classifiers. From this framework, it can be shown that strategies for combining independent classifiers, such as the weighted product or sum rules, naturally emerge depending on the influence of the mismatch. Based on the assumption that poor performance in most audio-visual speaker recognition applications can be attributed to train/test mismatches we propose that the main impetus of practical audio-visual integration is to dampen the independent errors, resulting from the mismatch, rather than trying to model any bimodal speech dependencies. To this end a strategy is recommended, based on theory and empirical evidence, using a hybrid between the weighted product and weighted sum rules in the presence of varying acoustic noise. Results are presented on the M2VTS database.
10 schema:editor Nf35c01b6ddb8486fa929b1c5e98ec858
11 schema:genre chapter
12 schema:inLanguage en
13 schema:isAccessibleForFree false
14 schema:isPartOf Nb86225e46f7d4ce78a1d3412a78dcf65
15 schema:name Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy
16 schema:pagination 929-936
17 schema:productId N3f0ceed9f0694a3d8bd5b6245c0265da
18 N48f3eac2f95d4b52ba3be299a628c3db
19 N8c8ee507e74944a884d4eba1aa5744c4
20 schema:publisher N6322053d2efb4cc480128528c59079c4
21 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045209544
22 https://doi.org/10.1007/3-540-44887-x_108
23 schema:sdDatePublished 2019-04-16T05:31
24 schema:sdLicense https://scigraph.springernature.com/explorer/license/
25 schema:sdPublisher Nf06c3836d7764f079d9f6b64cb8e7b0d
26 schema:url https://link.springer.com/10.1007%2F3-540-44887-X_108
27 sgo:license sg:explorer/license/
28 sgo:sdDataset chapters
29 rdf:type schema:Chapter
30 N090d7962c78141adbb5b56a879b173e0 schema:familyName Kittler
31 schema:givenName Josef
32 rdf:type schema:Person
33 N3f0ceed9f0694a3d8bd5b6245c0265da schema:name dimensions_id
34 schema:value pub.1045209544
35 rdf:type schema:PropertyValue
36 N48f3eac2f95d4b52ba3be299a628c3db schema:name readcube_id
37 schema:value d7591d1bf1b07307cb77e5f045f072e28a6aee8edfa3180fc8dfb31fc93372a7
38 rdf:type schema:PropertyValue
39 N6322053d2efb4cc480128528c59079c4 schema:location Berlin, Heidelberg
40 schema:name Springer Berlin Heidelberg
41 rdf:type schema:Organisation
42 N8c8ee507e74944a884d4eba1aa5744c4 schema:name doi
43 schema:value 10.1007/3-540-44887-x_108
44 rdf:type schema:PropertyValue
45 Nb76c1172beb04f9782ec61d2a664b003 schema:familyName Nixon
46 schema:givenName Mark S.
47 rdf:type schema:Person
48 Nb86225e46f7d4ce78a1d3412a78dcf65 schema:isbn 978-3-540-40302-9
49 978-3-540-44887-7
50 schema:name Audio- and Video-Based Biometric Person Authentication
51 rdf:type schema:Book
52 Ndb97f7269cb3498597b9fe4703ec4125 rdf:first sg:person.0754071362.25
53 rdf:rest Ndbb0640969fe428589190f4358014297
54 Ndbb0640969fe428589190f4358014297 rdf:first sg:person.012245072625.31
55 rdf:rest rdf:nil
56 Nf06c3836d7764f079d9f6b64cb8e7b0d schema:name Springer Nature - SN SciGraph project
57 rdf:type schema:Organization
58 Nf35c01b6ddb8486fa929b1c5e98ec858 rdf:first N090d7962c78141adbb5b56a879b173e0
59 rdf:rest Nff2e43293c964931b1548140c12236dc
60 Nff2e43293c964931b1548140c12236dc rdf:first Nb76c1172beb04f9782ec61d2a664b003
61 rdf:rest rdf:nil
62 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
63 schema:name Information and Computing Sciences
64 rdf:type schema:DefinedTerm
65 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
66 schema:name Artificial Intelligence and Image Processing
67 rdf:type schema:DefinedTerm
68 sg:person.012245072625.31 schema:affiliation https://www.grid.ac/institutes/grid.147455.6
69 schema:familyName Chen
70 schema:givenName Tsuhan
71 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012245072625.31
72 rdf:type schema:Person
73 sg:person.0754071362.25 schema:affiliation https://www.grid.ac/institutes/grid.147455.6
74 schema:familyName Lucey
75 schema:givenName Simon
76 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0754071362.25
77 rdf:type schema:Person
78 sg:pub.10.1007/bf01238023 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000725148
79 https://doi.org/10.1007/bf01238023
80 rdf:type schema:CreativeWork
81 sg:pub.10.1155/s1110865703209045 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063207734
82 https://doi.org/10.1155/s1110865703209045
83 rdf:type schema:CreativeWork
84 https://doi.org/10.1016/s0167-8655(97)00070-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025388281
85 rdf:type schema:CreativeWork
86 https://www.grid.ac/institutes/grid.147455.6 schema:alternateName Carnegie Mellon University
87 schema:name Advanced Multimedia Processing Laboratory, Department of Electrical and Computer Engineering, Carnegie Mellon University, 15213, Pittsburgh, PA, USA
88 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...