The Dynamics of Audiovisual Behavior in Speech View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

1996

AUTHORS

Eric Vatikiotis-Bateson , Kevin G. Munhall , Makoto Hirayama , Y. Victor Lee , Demetri Terzopoulos

ABSTRACT

While it is well-known that faces provide linguistically relevant information during communication, most efforts to identify the visual correlates of the acoustic signal have focused on the shape, position and luminance of the oral aperture. In this work, we extend the analysis to full facial motion under the assumption that the process of producing speech acoustics generates linguistically salient visual information, which is distributed over large portions of the face. Support for this is drawn from our recent studies of the eye movements of perceivers during a variety of audiovisual speech perception tasks. These studies suggest that perceivers detect visual information at low spatial frequencies and that such information may not be restricted to the region of the oral aperture. Since the biomechanical linkage between the facial and vocal tract systems is one of close proximity and shared physiology, we propose that physiological models of speech and facial motion be integrated into one audiovisual model of speech production. In addition to providing a coherent account of audiovisual motor control, the proposed model could become a useful experimental tool, providing synthetic audiovisual stimuli with realistic control parameters. More... »

PAGES

221-232

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-662-13015-5_16

DOI

http://dx.doi.org/10.1007/978-3-662-13015-5_16

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1028380650


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology and Cognitive Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1701", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "ATR HIP Research Labs, Japan", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "ATR HIP Research Labs, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Vatikiotis-Bateson", 
        "givenName": "Eric", 
        "id": "sg:person.01337362117.12", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01337362117.12"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Queen\u2019s University, Canada", 
          "id": "http://www.grid.ac/institutes/grid.410356.5", 
          "name": [
            "Queen\u2019s University, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Munhall", 
        "givenName": "Kevin G.", 
        "id": "sg:person.01131616354.85", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01131616354.85"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Hewlett-Packard Labs, Japan", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Hewlett-Packard Labs, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hirayama", 
        "givenName": "Makoto", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lee", 
        "givenName": "Y. Victor", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Terzopoulos", 
        "givenName": "Demetri", 
        "id": "sg:person.016347323445.35", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016347323445.35"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "1996", 
    "datePublishedReg": "1996-01-01", 
    "description": "While it is well-known that faces provide linguistically relevant information during communication, most efforts to identify the visual correlates of the acoustic signal have focused on the shape, position and luminance of the oral aperture. In this work, we extend the analysis to full facial motion under the assumption that the process of producing speech acoustics generates linguistically salient visual information, which is distributed over large portions of the face. Support for this is drawn from our recent studies of the eye movements of perceivers during a variety of audiovisual speech perception tasks. These studies suggest that perceivers detect visual information at low spatial frequencies and that such information may not be restricted to the region of the oral aperture. Since the biomechanical linkage between the facial and vocal tract systems is one of close proximity and shared physiology, we propose that physiological models of speech and facial motion be integrated into one audiovisual model of speech production. In addition to providing a coherent account of audiovisual motor control, the proposed model could become a useful experimental tool, providing synthetic audiovisual stimuli with realistic control parameters.", 
    "editor": [
      {
        "familyName": "Stork", 
        "givenName": "David G.", 
        "type": "Person"
      }, 
      {
        "familyName": "Hennecke", 
        "givenName": "Marcus E.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-662-13015-5_16", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-08252-8", 
        "978-3-662-13015-5"
      ], 
      "name": "Speechreading by Humans and Machines", 
      "type": "Book"
    }, 
    "keywords": [
      "facial motion", 
      "audiovisual speech perception tasks", 
      "visual information", 
      "speech perception tasks", 
      "salient visual information", 
      "perception task", 
      "audiovisual stimuli", 
      "visual correlates", 
      "speech production", 
      "audiovisual model", 
      "speech acoustics", 
      "low spatial frequencies", 
      "eye movements", 
      "vocal tract system", 
      "coherent account", 
      "perceivers", 
      "motor control", 
      "speech", 
      "spatial frequency", 
      "biomechanical linkage", 
      "acoustic signals", 
      "relevant information", 
      "face", 
      "stimuli", 
      "task", 
      "correlates", 
      "physiological model", 
      "information", 
      "such information", 
      "most efforts", 
      "luminance", 
      "support", 
      "oral aperture", 
      "Recent studies", 
      "behavior", 
      "tract system", 
      "communication", 
      "study", 
      "model", 
      "acoustics", 
      "experimental tool", 
      "movement", 
      "account", 
      "efforts", 
      "assumption", 
      "useful experimental tool", 
      "process", 
      "large portion", 
      "motion", 
      "control", 
      "work", 
      "variety", 
      "analysis", 
      "addition", 
      "tool", 
      "proximity", 
      "linkage", 
      "signals", 
      "position", 
      "dynamics", 
      "shape", 
      "frequency", 
      "physiology", 
      "system", 
      "region", 
      "portion", 
      "close proximity", 
      "control parameters", 
      "production", 
      "aperture", 
      "parameters", 
      "full facial motion", 
      "audiovisual motor control", 
      "synthetic audiovisual stimuli", 
      "realistic control parameters", 
      "Audiovisual Behavior"
    ], 
    "name": "The Dynamics of Audiovisual Behavior in Speech", 
    "pagination": "221-232", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1028380650"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-662-13015-5_16"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-662-13015-5_16", 
      "https://app.dimensions.ai/details/publication/pub.1028380650"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-01-01T19:08", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/chapter/chapter_137.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-662-13015-5_16"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-13015-5_16'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-13015-5_16'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-13015-5_16'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-662-13015-5_16'


 

This table displays all metadata directly associated to this object as RDF triples.

183 TRIPLES      23 PREDICATES      104 URIs      95 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-662-13015-5_16 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 anzsrc-for:17
4 anzsrc-for:1701
5 schema:author N837383e3d1c4434ca85e54a8ed64d428
6 schema:datePublished 1996
7 schema:datePublishedReg 1996-01-01
8 schema:description While it is well-known that faces provide linguistically relevant information during communication, most efforts to identify the visual correlates of the acoustic signal have focused on the shape, position and luminance of the oral aperture. In this work, we extend the analysis to full facial motion under the assumption that the process of producing speech acoustics generates linguistically salient visual information, which is distributed over large portions of the face. Support for this is drawn from our recent studies of the eye movements of perceivers during a variety of audiovisual speech perception tasks. These studies suggest that perceivers detect visual information at low spatial frequencies and that such information may not be restricted to the region of the oral aperture. Since the biomechanical linkage between the facial and vocal tract systems is one of close proximity and shared physiology, we propose that physiological models of speech and facial motion be integrated into one audiovisual model of speech production. In addition to providing a coherent account of audiovisual motor control, the proposed model could become a useful experimental tool, providing synthetic audiovisual stimuli with realistic control parameters.
9 schema:editor N5cb420f2110b4893aec7e5d6caf4fdb3
10 schema:genre chapter
11 schema:inLanguage en
12 schema:isAccessibleForFree true
13 schema:isPartOf Na3f6be1a160c4a5db1c39e909483657b
14 schema:keywords Audiovisual Behavior
15 Recent studies
16 account
17 acoustic signals
18 acoustics
19 addition
20 analysis
21 aperture
22 assumption
23 audiovisual model
24 audiovisual motor control
25 audiovisual speech perception tasks
26 audiovisual stimuli
27 behavior
28 biomechanical linkage
29 close proximity
30 coherent account
31 communication
32 control
33 control parameters
34 correlates
35 dynamics
36 efforts
37 experimental tool
38 eye movements
39 face
40 facial motion
41 frequency
42 full facial motion
43 information
44 large portion
45 linkage
46 low spatial frequencies
47 luminance
48 model
49 most efforts
50 motion
51 motor control
52 movement
53 oral aperture
54 parameters
55 perceivers
56 perception task
57 physiological model
58 physiology
59 portion
60 position
61 process
62 production
63 proximity
64 realistic control parameters
65 region
66 relevant information
67 salient visual information
68 shape
69 signals
70 spatial frequency
71 speech
72 speech acoustics
73 speech perception tasks
74 speech production
75 stimuli
76 study
77 such information
78 support
79 synthetic audiovisual stimuli
80 system
81 task
82 tool
83 tract system
84 useful experimental tool
85 variety
86 visual correlates
87 visual information
88 vocal tract system
89 work
90 schema:name The Dynamics of Audiovisual Behavior in Speech
91 schema:pagination 221-232
92 schema:productId N4f4036f6c5be40409d9d4c5c89602447
93 Ncb4de81aa75b48f985c431627da27d95
94 schema:publisher Nbadd2327d9e044a7852035a66fd33e02
95 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028380650
96 https://doi.org/10.1007/978-3-662-13015-5_16
97 schema:sdDatePublished 2022-01-01T19:08
98 schema:sdLicense https://scigraph.springernature.com/explorer/license/
99 schema:sdPublisher Nb87cb207c2104f449b61aec87ab3aa66
100 schema:url https://doi.org/10.1007/978-3-662-13015-5_16
101 sgo:license sg:explorer/license/
102 sgo:sdDataset chapters
103 rdf:type schema:Chapter
104 N09c97b89bdf8406d864bc6c9b7c97904 rdf:first Nf4158f79f36c425a905a46d287059a76
105 rdf:rest rdf:nil
106 N149ee4e13d934275a1ecfa7ee4c5ea41 schema:affiliation grid-institutes:grid.17063.33
107 schema:familyName Lee
108 schema:givenName Y. Victor
109 rdf:type schema:Person
110 N437270ef71b64deaa7c28d8813acc233 rdf:first sg:person.01131616354.85
111 rdf:rest N7392d402fb964c659818a3334f0d58b1
112 N4f4036f6c5be40409d9d4c5c89602447 schema:name dimensions_id
113 schema:value pub.1028380650
114 rdf:type schema:PropertyValue
115 N5cb420f2110b4893aec7e5d6caf4fdb3 rdf:first Nd6df0d81433b4bd2ae066bcd6fc91033
116 rdf:rest N09c97b89bdf8406d864bc6c9b7c97904
117 N66c740bd07614d3d82f24cbc7e5474e3 schema:affiliation grid-institutes:None
118 schema:familyName Hirayama
119 schema:givenName Makoto
120 rdf:type schema:Person
121 N7392d402fb964c659818a3334f0d58b1 rdf:first N66c740bd07614d3d82f24cbc7e5474e3
122 rdf:rest Nf594c616bcfc47159cfa60808cde468b
123 N837383e3d1c4434ca85e54a8ed64d428 rdf:first sg:person.01337362117.12
124 rdf:rest N437270ef71b64deaa7c28d8813acc233
125 Na3f6be1a160c4a5db1c39e909483657b schema:isbn 978-3-642-08252-8
126 978-3-662-13015-5
127 schema:name Speechreading by Humans and Machines
128 rdf:type schema:Book
129 Nb87cb207c2104f449b61aec87ab3aa66 schema:name Springer Nature - SN SciGraph project
130 rdf:type schema:Organization
131 Nbadd2327d9e044a7852035a66fd33e02 schema:name Springer Nature
132 rdf:type schema:Organisation
133 Ncb4de81aa75b48f985c431627da27d95 schema:name doi
134 schema:value 10.1007/978-3-662-13015-5_16
135 rdf:type schema:PropertyValue
136 Nd6df0d81433b4bd2ae066bcd6fc91033 schema:familyName Stork
137 schema:givenName David G.
138 rdf:type schema:Person
139 Ne784581f14374d35b3d9e7abd656792f rdf:first sg:person.016347323445.35
140 rdf:rest rdf:nil
141 Nf4158f79f36c425a905a46d287059a76 schema:familyName Hennecke
142 schema:givenName Marcus E.
143 rdf:type schema:Person
144 Nf594c616bcfc47159cfa60808cde468b rdf:first N149ee4e13d934275a1ecfa7ee4c5ea41
145 rdf:rest Ne784581f14374d35b3d9e7abd656792f
146 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
147 schema:name Information and Computing Sciences
148 rdf:type schema:DefinedTerm
149 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
150 schema:name Artificial Intelligence and Image Processing
151 rdf:type schema:DefinedTerm
152 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
153 schema:name Psychology and Cognitive Sciences
154 rdf:type schema:DefinedTerm
155 anzsrc-for:1701 schema:inDefinedTermSet anzsrc-for:
156 schema:name Psychology
157 rdf:type schema:DefinedTerm
158 sg:person.01131616354.85 schema:affiliation grid-institutes:grid.410356.5
159 schema:familyName Munhall
160 schema:givenName Kevin G.
161 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01131616354.85
162 rdf:type schema:Person
163 sg:person.01337362117.12 schema:affiliation grid-institutes:None
164 schema:familyName Vatikiotis-Bateson
165 schema:givenName Eric
166 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01337362117.12
167 rdf:type schema:Person
168 sg:person.016347323445.35 schema:affiliation grid-institutes:grid.17063.33
169 schema:familyName Terzopoulos
170 schema:givenName Demetri
171 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016347323445.35
172 rdf:type schema:Person
173 grid-institutes:None schema:alternateName ATR HIP Research Labs, Japan
174 Hewlett-Packard Labs, Japan
175 schema:name ATR HIP Research Labs, Japan
176 Hewlett-Packard Labs, Japan
177 rdf:type schema:Organization
178 grid-institutes:grid.17063.33 schema:alternateName University of Toronto, Canada
179 schema:name University of Toronto, Canada
180 rdf:type schema:Organization
181 grid-institutes:grid.410356.5 schema:alternateName Queen’s University, Canada
182 schema:name Queen’s University, Canada
183 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...