Cross-Modal Predictive Coding for Talking Head Sequences View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

1996

AUTHORS

Ram R. Rao , Tsuhan Chen

ABSTRACT

Predictive coding of video has traditionally used information from previous video frames to help construct an estimate of the current frame. The difference between the original and estimated signal can then be transmitted to allow the receiver to fully reconstruct the original video frame. In this paper, we explore a new algorithm for use in coding the shape of a person’s lips in a head-and-shoulder video sequence. This algorithm uses the same predictive coding loop, but instead of forming an estimate of the lip image using motion compensation and previous video frames, it forms an estimate from the associated acoustic data. Since the acoustic data is also transmitted, the receiver is able to reconstruct the video with very little side information. In this paper, we will describe our predictive coding system and analyze methods for converting from the acoustic data to visual estimates. More... »

PAGES

301-308

Book

TITLE

Multimedia Communications and Video Coding

ISBN

978-1-4613-8036-8
978-1-4613-0403-6

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-1-4613-0403-6_37

DOI

http://dx.doi.org/10.1007/978-1-4613-0403-6_37

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1013092834


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Georgia Institute of Technology", 
          "id": "https://www.grid.ac/institutes/grid.213917.f", 
          "name": [
            "Georgia Institute of Technology, 30332, Atlanta, GA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rao", 
        "givenName": "Ram R.", 
        "id": "sg:person.015506157501.37", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015506157501.37"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Nokia (United States)", 
          "id": "https://www.grid.ac/institutes/grid.469490.6", 
          "name": [
            "AT&T Bell Laboratories, 07733, Holmdel, NJ, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chen", 
        "givenName": "Tsuhan", 
        "id": "sg:person.012245072625.31", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012245072625.31"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/0923-5965(89)90006-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051880321"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0923-5965(89)90006-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051880321"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/86.372898", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061241161"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "1996", 
    "datePublishedReg": "1996-01-01", 
    "description": "Predictive coding of video has traditionally used information from previous video frames to help construct an estimate of the current frame. The difference between the original and estimated signal can then be transmitted to allow the receiver to fully reconstruct the original video frame. In this paper, we explore a new algorithm for use in coding the shape of a person\u2019s lips in a head-and-shoulder video sequence. This algorithm uses the same predictive coding loop, but instead of forming an estimate of the lip image using motion compensation and previous video frames, it forms an estimate from the associated acoustic data. Since the acoustic data is also transmitted, the receiver is able to reconstruct the video with very little side information. In this paper, we will describe our predictive coding system and analyze methods for converting from the acoustic data to visual estimates.", 
    "editor": [
      {
        "familyName": "Wang", 
        "givenName": "Yao", 
        "type": "Person"
      }, 
      {
        "familyName": "Panwar", 
        "givenName": "Shivendra", 
        "type": "Person"
      }, 
      {
        "familyName": "Kim", 
        "givenName": "Seung-Pil", 
        "type": "Person"
      }, 
      {
        "familyName": "Bertoni", 
        "givenName": "Henry L.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-1-4613-0403-6_37", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-1-4613-8036-8", 
        "978-1-4613-0403-6"
      ], 
      "name": "Multimedia Communications and Video Coding", 
      "type": "Book"
    }, 
    "name": "Cross-Modal Predictive Coding for Talking Head Sequences", 
    "pagination": "301-308", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1013092834"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-1-4613-0403-6_37"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "3e5f0864d9200fcb5f8b212f5972f77d642673e0785c00b4dc0c4f133d9c3628"
        ]
      }
    ], 
    "publisher": {
      "location": "Boston, MA", 
      "name": "Springer US", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-1-4613-0403-6_37", 
      "https://app.dimensions.ai/details/publication/pub.1013092834"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T09:16", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000371_0000000371/records_130814_00000001.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-1-4613-0403-6_37"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-1-4613-0403-6_37'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-1-4613-0403-6_37'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-1-4613-0403-6_37'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-1-4613-0403-6_37'


 

This table displays all metadata directly associated to this object as RDF triples.

96 TRIPLES      23 PREDICATES      29 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-1-4613-0403-6_37 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Ne714738df8a546ca9d0ed07153803f5f
4 schema:citation https://doi.org/10.1016/0923-5965(89)90006-4
5 https://doi.org/10.1109/86.372898
6 schema:datePublished 1996
7 schema:datePublishedReg 1996-01-01
8 schema:description Predictive coding of video has traditionally used information from previous video frames to help construct an estimate of the current frame. The difference between the original and estimated signal can then be transmitted to allow the receiver to fully reconstruct the original video frame. In this paper, we explore a new algorithm for use in coding the shape of a person’s lips in a head-and-shoulder video sequence. This algorithm uses the same predictive coding loop, but instead of forming an estimate of the lip image using motion compensation and previous video frames, it forms an estimate from the associated acoustic data. Since the acoustic data is also transmitted, the receiver is able to reconstruct the video with very little side information. In this paper, we will describe our predictive coding system and analyze methods for converting from the acoustic data to visual estimates.
9 schema:editor N865f536833fe451eb355ea71a0fa7c63
10 schema:genre chapter
11 schema:inLanguage en
12 schema:isAccessibleForFree true
13 schema:isPartOf N9b1d00b4450344dcaa6bb4e2bc417f81
14 schema:name Cross-Modal Predictive Coding for Talking Head Sequences
15 schema:pagination 301-308
16 schema:productId N0393d7f1a8de4636809073fee059d47f
17 N8fce35658b474f2292d46967c7c9661b
18 Ne69b8f93985647f981f60629bc0d256d
19 schema:publisher N0f3e8c701df74beca40b7be726376237
20 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013092834
21 https://doi.org/10.1007/978-1-4613-0403-6_37
22 schema:sdDatePublished 2019-04-16T09:16
23 schema:sdLicense https://scigraph.springernature.com/explorer/license/
24 schema:sdPublisher N5637e626a67141d79b541ff9e742bac9
25 schema:url https://link.springer.com/10.1007%2F978-1-4613-0403-6_37
26 sgo:license sg:explorer/license/
27 sgo:sdDataset chapters
28 rdf:type schema:Chapter
29 N030fc2beed0f423a9eccca7541775455 schema:familyName Bertoni
30 schema:givenName Henry L.
31 rdf:type schema:Person
32 N0393d7f1a8de4636809073fee059d47f schema:name doi
33 schema:value 10.1007/978-1-4613-0403-6_37
34 rdf:type schema:PropertyValue
35 N0f3e8c701df74beca40b7be726376237 schema:location Boston, MA
36 schema:name Springer US
37 rdf:type schema:Organisation
38 N1e53f96eacfa47f2912a5a091d6ec2ef rdf:first N030fc2beed0f423a9eccca7541775455
39 rdf:rest rdf:nil
40 N5637e626a67141d79b541ff9e742bac9 schema:name Springer Nature - SN SciGraph project
41 rdf:type schema:Organization
42 N586e5bb1305e4deb9ba138f9d6dc9af9 rdf:first sg:person.012245072625.31
43 rdf:rest rdf:nil
44 N607634be5f294971b4f743da94ee0c30 schema:familyName Wang
45 schema:givenName Yao
46 rdf:type schema:Person
47 N865f536833fe451eb355ea71a0fa7c63 rdf:first N607634be5f294971b4f743da94ee0c30
48 rdf:rest N8eae66800fec4c528d6e3ba94c425298
49 N8eae66800fec4c528d6e3ba94c425298 rdf:first Nb53d7f84ed3e434b9faa572aa2a01f64
50 rdf:rest Na146d5346cc54111986aba12cecac9f8
51 N8fce35658b474f2292d46967c7c9661b schema:name dimensions_id
52 schema:value pub.1013092834
53 rdf:type schema:PropertyValue
54 N948cedf3a8b1451a8189819e80eaa2e2 schema:familyName Kim
55 schema:givenName Seung-Pil
56 rdf:type schema:Person
57 N9b1d00b4450344dcaa6bb4e2bc417f81 schema:isbn 978-1-4613-0403-6
58 978-1-4613-8036-8
59 schema:name Multimedia Communications and Video Coding
60 rdf:type schema:Book
61 Na146d5346cc54111986aba12cecac9f8 rdf:first N948cedf3a8b1451a8189819e80eaa2e2
62 rdf:rest N1e53f96eacfa47f2912a5a091d6ec2ef
63 Nb53d7f84ed3e434b9faa572aa2a01f64 schema:familyName Panwar
64 schema:givenName Shivendra
65 rdf:type schema:Person
66 Ne69b8f93985647f981f60629bc0d256d schema:name readcube_id
67 schema:value 3e5f0864d9200fcb5f8b212f5972f77d642673e0785c00b4dc0c4f133d9c3628
68 rdf:type schema:PropertyValue
69 Ne714738df8a546ca9d0ed07153803f5f rdf:first sg:person.015506157501.37
70 rdf:rest N586e5bb1305e4deb9ba138f9d6dc9af9
71 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
72 schema:name Information and Computing Sciences
73 rdf:type schema:DefinedTerm
74 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
75 schema:name Artificial Intelligence and Image Processing
76 rdf:type schema:DefinedTerm
77 sg:person.012245072625.31 schema:affiliation https://www.grid.ac/institutes/grid.469490.6
78 schema:familyName Chen
79 schema:givenName Tsuhan
80 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012245072625.31
81 rdf:type schema:Person
82 sg:person.015506157501.37 schema:affiliation https://www.grid.ac/institutes/grid.213917.f
83 schema:familyName Rao
84 schema:givenName Ram R.
85 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015506157501.37
86 rdf:type schema:Person
87 https://doi.org/10.1016/0923-5965(89)90006-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051880321
88 rdf:type schema:CreativeWork
89 https://doi.org/10.1109/86.372898 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061241161
90 rdf:type schema:CreativeWork
91 https://www.grid.ac/institutes/grid.213917.f schema:alternateName Georgia Institute of Technology
92 schema:name Georgia Institute of Technology, 30332, Atlanta, GA, USA
93 rdf:type schema:Organization
94 https://www.grid.ac/institutes/grid.469490.6 schema:alternateName Nokia (United States)
95 schema:name AT&T Bell Laboratories, 07733, Holmdel, NJ, USA
96 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...