ECO: Efficient Convolutional Network for Online Video Understanding View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2018-10-09

AUTHORS

Mohammadreza Zolfaghari , Kamaljeet Singh , Thomas Brox

ABSTRACT

The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While there are local methods with fast per-frame processing, the processing of the whole video is not efficient and hampers fast video retrieval or online classification of long-term activities. In this paper, we introduce a network architecture (https://github.com/mzolfaghari/ECO-efficient-video-understanding) that takes long-term content into account and enables fast per-video processing at the same time. The architecture is based on merging long-term content already in the network rather than in a post-hoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields high-quality action classification and video captioning at up to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} to 80\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} faster than state-of-the-art methods. More... »

PAGES

713-730

Book

TITLE

Computer Vision – ECCV 2018

ISBN

978-3-030-01215-1
978-3-030-01216-8

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-030-01216-8_43

DOI

http://dx.doi.org/10.1007/978-3-030-01216-8_43

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1107502594


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Freiburg, Freiburg im Breisgau, Germany", 
          "id": "http://www.grid.ac/institutes/grid.5963.9", 
          "name": [
            "University of Freiburg, Freiburg im Breisgau, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zolfaghari", 
        "givenName": "Mohammadreza", 
        "id": "sg:person.013460010610.63", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013460010610.63"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Freiburg, Freiburg im Breisgau, Germany", 
          "id": "http://www.grid.ac/institutes/grid.5963.9", 
          "name": [
            "University of Freiburg, Freiburg im Breisgau, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Singh", 
        "givenName": "Kamaljeet", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Freiburg, Freiburg im Breisgau, Germany", 
          "id": "http://www.grid.ac/institutes/grid.5963.9", 
          "name": [
            "University of Freiburg, Freiburg im Breisgau, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brox", 
        "givenName": "Thomas", 
        "id": "sg:person.012443225372.65", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012443225372.65"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2018-10-09", 
    "datePublishedReg": "2018-10-09", 
    "description": "The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While there are local methods with fast per-frame processing, the processing of the whole video is not efficient and hampers fast video retrieval or online classification of long-term activities. In this paper, we introduce a network architecture (https://github.com/mzolfaghari/ECO-efficient-video-understanding) that takes long-term content into account and enables fast per-video processing at the same time. The architecture is based on merging long-term content already in the network rather than in a post-hoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields high-quality action classification and video captioning at up\u00a0to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$\\times $$\\end{document} to 80\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$\\times $$\\end{document} faster than state-of-the-art methods.", 
    "editor": [
      {
        "familyName": "Ferrari", 
        "givenName": "Vittorio", 
        "type": "Person"
      }, 
      {
        "familyName": "Hebert", 
        "givenName": "Martial", 
        "type": "Person"
      }, 
      {
        "familyName": "Sminchisescu", 
        "givenName": "Cristian", 
        "type": "Person"
      }, 
      {
        "familyName": "Weiss", 
        "givenName": "Yair", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-030-01216-8_43", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-030-01215-1", 
        "978-3-030-01216-8"
      ], 
      "name": "Computer Vision \u2013 ECCV 2018", 
      "type": "Book"
    }, 
    "keywords": [
      "long-term content", 
      "video understanding", 
      "Efficient Convolutional Networks", 
      "video retrieval", 
      "whole video", 
      "video processing", 
      "video captioning", 
      "convolutional network", 
      "online classification", 
      "network architecture", 
      "action classification", 
      "neighboring frames", 
      "art methods", 
      "frame processing", 
      "competitive performance", 
      "video", 
      "architecture", 
      "local methods", 
      "network", 
      "processing", 
      "classification", 
      "captioning", 
      "retrieval", 
      "same time", 
      "dataset", 
      "sampling strategy", 
      "frame", 
      "reasoning", 
      "art", 
      "fusion", 
      "method", 
      "performance", 
      "long-term activity", 
      "major part", 
      "seconds", 
      "important relationship", 
      "state", 
      "time", 
      "strategies", 
      "content", 
      "part", 
      "hampers", 
      "account", 
      "understanding", 
      "action", 
      "relationship", 
      "activity", 
      "paper", 
      "problem", 
      "approach"
    ], 
    "name": "ECO: Efficient Convolutional Network for Online Video Understanding", 
    "pagination": "713-730", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1107502594"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-030-01216-8_43"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-030-01216-8_43", 
      "https://app.dimensions.ai/details/publication/pub.1107502594"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-10-01T06:56", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/chapter/chapter_325.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-030-01216-8_43"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-01216-8_43'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-01216-8_43'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-01216-8_43'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-01216-8_43'


 

This table displays all metadata directly associated to this object as RDF triples.

137 TRIPLES      22 PREDICATES      74 URIs      67 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-030-01216-8_43 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Nd137fb99eaca4ab297d97cd66ff55596
4 schema:datePublished 2018-10-09
5 schema:datePublishedReg 2018-10-09
6 schema:description The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While there are local methods with fast per-frame processing, the processing of the whole video is not efficient and hampers fast video retrieval or online classification of long-term activities. In this paper, we introduce a network architecture (https://github.com/mzolfaghari/ECO-efficient-video-understanding) that takes long-term content into account and enables fast per-video processing at the same time. The architecture is based on merging long-term content already in the network rather than in a post-hoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields high-quality action classification and video captioning at up to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} to 80\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} faster than state-of-the-art methods.
7 schema:editor Nf1e78228e42345a0becde5df9f40931d
8 schema:genre chapter
9 schema:isAccessibleForFree true
10 schema:isPartOf Nf80a5a933a704c77a3ab10efc602bb16
11 schema:keywords Efficient Convolutional Networks
12 account
13 action
14 action classification
15 activity
16 approach
17 architecture
18 art
19 art methods
20 captioning
21 classification
22 competitive performance
23 content
24 convolutional network
25 dataset
26 frame
27 frame processing
28 fusion
29 hampers
30 important relationship
31 local methods
32 long-term activity
33 long-term content
34 major part
35 method
36 neighboring frames
37 network
38 network architecture
39 online classification
40 paper
41 part
42 performance
43 problem
44 processing
45 reasoning
46 relationship
47 retrieval
48 same time
49 sampling strategy
50 seconds
51 state
52 strategies
53 time
54 understanding
55 video
56 video captioning
57 video processing
58 video retrieval
59 video understanding
60 whole video
61 schema:name ECO: Efficient Convolutional Network for Online Video Understanding
62 schema:pagination 713-730
63 schema:productId Nbdfdecc3be114591a9ac48b1a662076c
64 Ne7177a86f8d04e46bf9d7c5cd72be55b
65 schema:publisher Nde1ef8fdbe6c4b2193ba4db2b7cf420d
66 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107502594
67 https://doi.org/10.1007/978-3-030-01216-8_43
68 schema:sdDatePublished 2022-10-01T06:56
69 schema:sdLicense https://scigraph.springernature.com/explorer/license/
70 schema:sdPublisher Ne58d9e04ee04429f9252a1c59268d5c8
71 schema:url https://doi.org/10.1007/978-3-030-01216-8_43
72 sgo:license sg:explorer/license/
73 sgo:sdDataset chapters
74 rdf:type schema:Chapter
75 N1c6f60aec6694062b12a4020a8b48e6c schema:affiliation grid-institutes:grid.5963.9
76 schema:familyName Singh
77 schema:givenName Kamaljeet
78 rdf:type schema:Person
79 N1d50b43d7a4a4a19a908f1bdd166bde4 schema:familyName Hebert
80 schema:givenName Martial
81 rdf:type schema:Person
82 N5bdfbe5b76744d009edc6eacd61f15d1 schema:familyName Weiss
83 schema:givenName Yair
84 rdf:type schema:Person
85 N7abde377d42a48be8b192dfa88270f0c rdf:first N5bdfbe5b76744d009edc6eacd61f15d1
86 rdf:rest rdf:nil
87 N8284f86694494121bf259fb6318e8b17 schema:familyName Ferrari
88 schema:givenName Vittorio
89 rdf:type schema:Person
90 Naa145f8f53924742b79c9f78d7b92578 rdf:first Nf29139ab201b405db8378432c7dbd2c2
91 rdf:rest N7abde377d42a48be8b192dfa88270f0c
92 Nbdfdecc3be114591a9ac48b1a662076c schema:name dimensions_id
93 schema:value pub.1107502594
94 rdf:type schema:PropertyValue
95 Ncadc0bda1f92497ca305e6964efd0ad7 rdf:first N1d50b43d7a4a4a19a908f1bdd166bde4
96 rdf:rest Naa145f8f53924742b79c9f78d7b92578
97 Nd137fb99eaca4ab297d97cd66ff55596 rdf:first sg:person.013460010610.63
98 rdf:rest Nf86ee75cb7ed47e995b8803d5dc0178e
99 Nde1ef8fdbe6c4b2193ba4db2b7cf420d schema:name Springer Nature
100 rdf:type schema:Organisation
101 Ne58d9e04ee04429f9252a1c59268d5c8 schema:name Springer Nature - SN SciGraph project
102 rdf:type schema:Organization
103 Ne7177a86f8d04e46bf9d7c5cd72be55b schema:name doi
104 schema:value 10.1007/978-3-030-01216-8_43
105 rdf:type schema:PropertyValue
106 Nf1e78228e42345a0becde5df9f40931d rdf:first N8284f86694494121bf259fb6318e8b17
107 rdf:rest Ncadc0bda1f92497ca305e6964efd0ad7
108 Nf29139ab201b405db8378432c7dbd2c2 schema:familyName Sminchisescu
109 schema:givenName Cristian
110 rdf:type schema:Person
111 Nf5a5d88c8cb643cb8a9540ed1c0d2824 rdf:first sg:person.012443225372.65
112 rdf:rest rdf:nil
113 Nf80a5a933a704c77a3ab10efc602bb16 schema:isbn 978-3-030-01215-1
114 978-3-030-01216-8
115 schema:name Computer Vision – ECCV 2018
116 rdf:type schema:Book
117 Nf86ee75cb7ed47e995b8803d5dc0178e rdf:first N1c6f60aec6694062b12a4020a8b48e6c
118 rdf:rest Nf5a5d88c8cb643cb8a9540ed1c0d2824
119 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
120 schema:name Information and Computing Sciences
121 rdf:type schema:DefinedTerm
122 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
123 schema:name Artificial Intelligence and Image Processing
124 rdf:type schema:DefinedTerm
125 sg:person.012443225372.65 schema:affiliation grid-institutes:grid.5963.9
126 schema:familyName Brox
127 schema:givenName Thomas
128 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012443225372.65
129 rdf:type schema:Person
130 sg:person.013460010610.63 schema:affiliation grid-institutes:grid.5963.9
131 schema:familyName Zolfaghari
132 schema:givenName Mohammadreza
133 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013460010610.63
134 rdf:type schema:Person
135 grid-institutes:grid.5963.9 schema:alternateName University of Freiburg, Freiburg im Breisgau, Germany
136 schema:name University of Freiburg, Freiburg im Breisgau, Germany
137 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...