Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2002-12

AUTHORS

Eric K. Patterson, Sabri Gurbuz, Zekeriya Tufekci, John N. Gowdy

ABSTRACT

Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE) database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are given. More... »

PAGES

208541

Identifiers

URI

http://scigraph.springernature.com/pub.10.1155/s1110865702206101

DOI

http://dx.doi.org/10.1155/s1110865702206101

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1063207692


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Clemson University", 
          "id": "https://www.grid.ac/institutes/grid.26090.3d", 
          "name": [
            "Department of Electrical and Computer Engineering, Clemson University, 29634, Clemson, SC, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Patterson", 
        "givenName": "Eric K.", 
        "id": "sg:person.016030767057.18", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016030767057.18"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Clemson University", 
          "id": "https://www.grid.ac/institutes/grid.26090.3d", 
          "name": [
            "Department of Electrical and Computer Engineering, Clemson University, 29634, Clemson, SC, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Gurbuz", 
        "givenName": "Sabri", 
        "id": "sg:person.07522637313.39", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07522637313.39"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Clemson University", 
          "id": "https://www.grid.ac/institutes/grid.26090.3d", 
          "name": [
            "Department of Electrical and Computer Engineering, Clemson University, 29634, Clemson, SC, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tufekci", 
        "givenName": "Zekeriya", 
        "id": "sg:person.016661051436.44", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016661051436.44"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Clemson University", 
          "id": "https://www.grid.ac/institutes/grid.26090.3d", 
          "name": [
            "Department of Electrical and Computer Engineering, Clemson University, 29634, Clemson, SC, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Gowdy", 
        "givenName": "John N.", 
        "id": "sg:person.016274023713.84", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016274023713.84"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2002-12", 
    "datePublishedReg": "2002-12-01", 
    "description": "Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE) database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are given.", 
    "genre": "non_research_article", 
    "id": "sg:pub.10.1155/s1110865702206101", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1357355", 
        "issn": [
          "1687-6172", 
          "1687-0433"
        ], 
        "name": "Applied Signal Processing", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "11", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "2002"
      }
    ], 
    "name": "Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus", 
    "pagination": "208541", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "8b4ead1119f1d0569de6489cccdf6487b806ff303af9a2471dd7f43b1ea4f6d4"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1155/s1110865702206101"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1063207692"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1155/s1110865702206101", 
      "https://app.dimensions.ai/details/publication/pub.1063207692"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T00:15", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8695_00000508.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1155%2FS1110865702206101"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1155/s1110865702206101'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1155/s1110865702206101'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1155/s1110865702206101'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1155/s1110865702206101'


 

This table displays all metadata directly associated to this object as RDF triples.

82 TRIPLES      20 PREDICATES      27 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1155/s1110865702206101 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Ncd03c01cf9ac4b7c814dbf89c199f9c5
4 schema:datePublished 2002-12
5 schema:datePublishedReg 2002-12-01
6 schema:description Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE) database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are given.
7 schema:genre non_research_article
8 schema:inLanguage en
9 schema:isAccessibleForFree true
10 schema:isPartOf N36be562422d64b5faf4b069df9a03a63
11 Ndd6d1a8be39544da8a0de50704f8eb66
12 sg:journal.1357355
13 schema:name Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus
14 schema:pagination 208541
15 schema:productId N688e69c4818f4b4db8cebc45e25b9621
16 Nda61c2793f644bf08b26eec641554ad2
17 Nefd287be817f45a3b9f2070b3c8dd521
18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063207692
19 https://doi.org/10.1155/s1110865702206101
20 schema:sdDatePublished 2019-04-11T00:15
21 schema:sdLicense https://scigraph.springernature.com/explorer/license/
22 schema:sdPublisher N8b81fcf2eb97418ba201e52203c34cf8
23 schema:url http://link.springer.com/10.1155%2FS1110865702206101
24 sgo:license sg:explorer/license/
25 sgo:sdDataset articles
26 rdf:type schema:ScholarlyArticle
27 N36be562422d64b5faf4b069df9a03a63 schema:volumeNumber 2002
28 rdf:type schema:PublicationVolume
29 N688e69c4818f4b4db8cebc45e25b9621 schema:name doi
30 schema:value 10.1155/s1110865702206101
31 rdf:type schema:PropertyValue
32 N68ab258b12b345edb66fc319a6c1e467 rdf:first sg:person.016661051436.44
33 rdf:rest Na6e9b559eebf48afa3a3bf993ab6c7c4
34 N8b81fcf2eb97418ba201e52203c34cf8 schema:name Springer Nature - SN SciGraph project
35 rdf:type schema:Organization
36 N98ebb976b69349b28e8f7213cbc4534a rdf:first sg:person.07522637313.39
37 rdf:rest N68ab258b12b345edb66fc319a6c1e467
38 Na6e9b559eebf48afa3a3bf993ab6c7c4 rdf:first sg:person.016274023713.84
39 rdf:rest rdf:nil
40 Ncd03c01cf9ac4b7c814dbf89c199f9c5 rdf:first sg:person.016030767057.18
41 rdf:rest N98ebb976b69349b28e8f7213cbc4534a
42 Nda61c2793f644bf08b26eec641554ad2 schema:name dimensions_id
43 schema:value pub.1063207692
44 rdf:type schema:PropertyValue
45 Ndd6d1a8be39544da8a0de50704f8eb66 schema:issueNumber 11
46 rdf:type schema:PublicationIssue
47 Nefd287be817f45a3b9f2070b3c8dd521 schema:name readcube_id
48 schema:value 8b4ead1119f1d0569de6489cccdf6487b806ff303af9a2471dd7f43b1ea4f6d4
49 rdf:type schema:PropertyValue
50 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
51 schema:name Information and Computing Sciences
52 rdf:type schema:DefinedTerm
53 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
54 schema:name Artificial Intelligence and Image Processing
55 rdf:type schema:DefinedTerm
56 sg:journal.1357355 schema:issn 1687-0433
57 1687-6172
58 schema:name Applied Signal Processing
59 rdf:type schema:Periodical
60 sg:person.016030767057.18 schema:affiliation https://www.grid.ac/institutes/grid.26090.3d
61 schema:familyName Patterson
62 schema:givenName Eric K.
63 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016030767057.18
64 rdf:type schema:Person
65 sg:person.016274023713.84 schema:affiliation https://www.grid.ac/institutes/grid.26090.3d
66 schema:familyName Gowdy
67 schema:givenName John N.
68 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016274023713.84
69 rdf:type schema:Person
70 sg:person.016661051436.44 schema:affiliation https://www.grid.ac/institutes/grid.26090.3d
71 schema:familyName Tufekci
72 schema:givenName Zekeriya
73 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016661051436.44
74 rdf:type schema:Person
75 sg:person.07522637313.39 schema:affiliation https://www.grid.ac/institutes/grid.26090.3d
76 schema:familyName Gurbuz
77 schema:givenName Sabri
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07522637313.39
79 rdf:type schema:Person
80 https://www.grid.ac/institutes/grid.26090.3d schema:alternateName Clemson University
81 schema:name Department of Electrical and Computer Engineering, Clemson University, 29634, Clemson, SC, USA
82 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...