Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2021-09-30

AUTHORS

Said Yacine Boulahia, Abdenour Amamra, Mohamed Ridha Madi, Said Daikh

ABSTRACT

Multimodal action recognition techniques combine several image modalities (RGB, Depth, Skeleton, and InfraRed) for a more robust recognition. According to the fusion level in the action recognition pipeline, we can distinguish three families of approaches: early fusion, where the raw modalities are combined ahead of feature extraction; intermediate fusion, the features, respective to each modality, are concatenated before classification; and late fusion, where the modality-wise classification results are combined. After reviewing the literature, we identified the principal defects of each category, which we try to address by first investigating more deeply the early-stage fusion that has been poorly explored in the literature. Second, intermediate fusion protocols operate on the feature map, irrespective of the particularity of human action, we propose a new scheme where we optimally combine modality-wise features. Third, as most of the late fusion solutions use handcrafted rules, prone to human bias, and far from real-world peculiarities, we adopt a neural learning strategy to extract significant features from data rather than assuming that artificial rules are correct. We validated our findings on two challenging datasets. Our obtained results were as good or better than their literature counterparts. More... »

PAGES

121

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s00138-021-01249-8

DOI

http://dx.doi.org/10.1007/s00138-021-01249-8

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1141519291


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology and Cognitive Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1701", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Boulahia", 
        "givenName": "Said Yacine", 
        "id": "sg:person.011341300517.89", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011341300517.89"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Amamra", 
        "givenName": "Abdenour", 
        "id": "sg:person.010106534777.71", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010106534777.71"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Madi", 
        "givenName": "Mohamed Ridha", 
        "id": "sg:person.010025676433.79", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010025676433.79"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Daikh", 
        "givenName": "Said", 
        "id": "sg:person.016530747633.01", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016530747633.01"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/978-3-030-58545-7_5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1132320915", 
          "https://doi.org/10.1007/978-3-030-58545-7_5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s12652-019-01239-9", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1112088455", 
          "https://doi.org/10.1007/s12652-019-01239-9"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s10489-020-01823-z", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1130328966", 
          "https://doi.org/10.1007/s10489-020-01823-z"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-030-01234-2_21", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1107454584", 
          "https://doi.org/10.1007/978-3-030-01234-2_21"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-030-69418-0_3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1135478128", 
          "https://doi.org/10.1007/978-3-030-69418-0_3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-030-64556-4_23", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1133436420", 
          "https://doi.org/10.1007/978-3-030-64556-4_23"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s11042-021-11058-w", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1138865569", 
          "https://doi.org/10.1007/s11042-021-11058-w"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2021-09-30", 
    "datePublishedReg": "2021-09-30", 
    "description": "Multimodal action recognition techniques combine several image modalities (RGB, Depth, Skeleton, and InfraRed) for a more robust recognition. According to the fusion level in the action recognition pipeline, we can distinguish three families of approaches: early fusion, where the raw modalities are combined ahead of feature extraction; intermediate fusion, the features, respective to each modality, are concatenated before classification; and late fusion, where the modality-wise classification results are combined. After reviewing the literature, we identified the principal defects of each category, which we try to address by first investigating more deeply the early-stage fusion that has been poorly explored in the literature. Second, intermediate fusion protocols operate on the feature map, irrespective of the particularity of human action, we propose a new scheme where we optimally combine modality-wise features. Third, as most of the late fusion solutions use handcrafted rules, prone to human bias, and far from real-world peculiarities, we adopt a neural learning strategy to extract significant features from data rather than assuming that artificial rules are correct. We validated our findings on two challenging datasets. Our obtained results were as good or better than their literature counterparts.", 
    "genre": "article", 
    "id": "sg:pub.10.1007/s00138-021-01249-8", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1045266", 
        "issn": [
          "0932-8092", 
          "1432-1769"
        ], 
        "name": "Machine Vision and Applications", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "6", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "32"
      }
    ], 
    "keywords": [
      "late fusion strategy", 
      "action recognition pipeline", 
      "action recognition techniques", 
      "recognition pipeline", 
      "action recognition", 
      "robust recognition", 
      "late fusion", 
      "feature extraction", 
      "feature maps", 
      "family of approaches", 
      "recognition techniques", 
      "fusion strategy", 
      "fusion solution", 
      "intermediate fusion", 
      "image modalities", 
      "classification results", 
      "human bias", 
      "early fusion", 
      "artificial rules", 
      "human actions", 
      "new scheme", 
      "learning strategies", 
      "significant features", 
      "fusion protocol", 
      "recognition", 
      "fusion", 
      "rules", 
      "dataset", 
      "features", 
      "fusion levels", 
      "classification", 
      "scheme", 
      "pipeline", 
      "protocol", 
      "extraction", 
      "maps", 
      "technique", 
      "strategies", 
      "solution", 
      "modalities", 
      "particularities", 
      "data", 
      "results", 
      "literature", 
      "categories", 
      "bias", 
      "literature counterparts", 
      "findings", 
      "family", 
      "action", 
      "peculiarities", 
      "counterparts", 
      "levels", 
      "approach", 
      "principal defect", 
      "defects", 
      "Multimodal action recognition techniques", 
      "raw modalities", 
      "modality-wise classification results", 
      "early-stage fusion", 
      "intermediate fusion protocols", 
      "modality-wise features", 
      "late fusion solutions", 
      "real-world peculiarities", 
      "neural learning strategy", 
      "robust deep learning-based multimodal action recognition", 
      "deep learning-based multimodal action recognition", 
      "learning-based multimodal action recognition", 
      "multimodal action recognition"
    ], 
    "name": "Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition", 
    "pagination": "121", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1141519291"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s00138-021-01249-8"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s00138-021-01249-8", 
      "https://app.dimensions.ai/details/publication/pub.1141519291"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-01-01T19:00", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/article/article_915.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1007/s00138-021-01249-8"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s00138-021-01249-8'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s00138-021-01249-8'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s00138-021-01249-8'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s00138-021-01249-8'


 

This table displays all metadata directly associated to this object as RDF triples.

184 TRIPLES      22 PREDICATES      103 URIs      86 LITERALS      6 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s00138-021-01249-8 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 anzsrc-for:17
4 anzsrc-for:1701
5 schema:author N9db138cbd6344a9c9a6be040ed6c36c1
6 schema:citation sg:pub.10.1007/978-3-030-01234-2_21
7 sg:pub.10.1007/978-3-030-58545-7_5
8 sg:pub.10.1007/978-3-030-64556-4_23
9 sg:pub.10.1007/978-3-030-69418-0_3
10 sg:pub.10.1007/s10489-020-01823-z
11 sg:pub.10.1007/s11042-021-11058-w
12 sg:pub.10.1007/s12652-019-01239-9
13 schema:datePublished 2021-09-30
14 schema:datePublishedReg 2021-09-30
15 schema:description Multimodal action recognition techniques combine several image modalities (RGB, Depth, Skeleton, and InfraRed) for a more robust recognition. According to the fusion level in the action recognition pipeline, we can distinguish three families of approaches: early fusion, where the raw modalities are combined ahead of feature extraction; intermediate fusion, the features, respective to each modality, are concatenated before classification; and late fusion, where the modality-wise classification results are combined. After reviewing the literature, we identified the principal defects of each category, which we try to address by first investigating more deeply the early-stage fusion that has been poorly explored in the literature. Second, intermediate fusion protocols operate on the feature map, irrespective of the particularity of human action, we propose a new scheme where we optimally combine modality-wise features. Third, as most of the late fusion solutions use handcrafted rules, prone to human bias, and far from real-world peculiarities, we adopt a neural learning strategy to extract significant features from data rather than assuming that artificial rules are correct. We validated our findings on two challenging datasets. Our obtained results were as good or better than their literature counterparts.
16 schema:genre article
17 schema:inLanguage en
18 schema:isAccessibleForFree false
19 schema:isPartOf N0656c9aba9214658944503dfdb8e9c21
20 Na98dc81f32d54084a392253faea8a1f1
21 sg:journal.1045266
22 schema:keywords Multimodal action recognition techniques
23 action
24 action recognition
25 action recognition pipeline
26 action recognition techniques
27 approach
28 artificial rules
29 bias
30 categories
31 classification
32 classification results
33 counterparts
34 data
35 dataset
36 deep learning-based multimodal action recognition
37 defects
38 early fusion
39 early-stage fusion
40 extraction
41 family
42 family of approaches
43 feature extraction
44 feature maps
45 features
46 findings
47 fusion
48 fusion levels
49 fusion protocol
50 fusion solution
51 fusion strategy
52 human actions
53 human bias
54 image modalities
55 intermediate fusion
56 intermediate fusion protocols
57 late fusion
58 late fusion solutions
59 late fusion strategy
60 learning strategies
61 learning-based multimodal action recognition
62 levels
63 literature
64 literature counterparts
65 maps
66 modalities
67 modality-wise classification results
68 modality-wise features
69 multimodal action recognition
70 neural learning strategy
71 new scheme
72 particularities
73 peculiarities
74 pipeline
75 principal defect
76 protocol
77 raw modalities
78 real-world peculiarities
79 recognition
80 recognition pipeline
81 recognition techniques
82 results
83 robust deep learning-based multimodal action recognition
84 robust recognition
85 rules
86 scheme
87 significant features
88 solution
89 strategies
90 technique
91 schema:name Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition
92 schema:pagination 121
93 schema:productId N0c90473ffdf34e8892ddd589d1d43cb3
94 N5aacb41fba5940f9b68ef8c4cc82a2d6
95 schema:sameAs https://app.dimensions.ai/details/publication/pub.1141519291
96 https://doi.org/10.1007/s00138-021-01249-8
97 schema:sdDatePublished 2022-01-01T19:00
98 schema:sdLicense https://scigraph.springernature.com/explorer/license/
99 schema:sdPublisher N948e4cb5d3e44c3c94cfbe949667c725
100 schema:url https://doi.org/10.1007/s00138-021-01249-8
101 sgo:license sg:explorer/license/
102 sgo:sdDataset articles
103 rdf:type schema:ScholarlyArticle
104 N0656c9aba9214658944503dfdb8e9c21 schema:volumeNumber 32
105 rdf:type schema:PublicationVolume
106 N0c90473ffdf34e8892ddd589d1d43cb3 schema:name dimensions_id
107 schema:value pub.1141519291
108 rdf:type schema:PropertyValue
109 N5aacb41fba5940f9b68ef8c4cc82a2d6 schema:name doi
110 schema:value 10.1007/s00138-021-01249-8
111 rdf:type schema:PropertyValue
112 N74f0fd9bdb084be7a48499482a96c3b4 rdf:first sg:person.010106534777.71
113 rdf:rest Neec356bd4be64ea6bc81e802e969b743
114 N948e4cb5d3e44c3c94cfbe949667c725 schema:name Springer Nature - SN SciGraph project
115 rdf:type schema:Organization
116 N9db138cbd6344a9c9a6be040ed6c36c1 rdf:first sg:person.011341300517.89
117 rdf:rest N74f0fd9bdb084be7a48499482a96c3b4
118 Na7b03b7c0a68470e87f79f9d68a74b07 rdf:first sg:person.016530747633.01
119 rdf:rest rdf:nil
120 Na98dc81f32d54084a392253faea8a1f1 schema:issueNumber 6
121 rdf:type schema:PublicationIssue
122 Neec356bd4be64ea6bc81e802e969b743 rdf:first sg:person.010025676433.79
123 rdf:rest Na7b03b7c0a68470e87f79f9d68a74b07
124 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
125 schema:name Information and Computing Sciences
126 rdf:type schema:DefinedTerm
127 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
128 schema:name Artificial Intelligence and Image Processing
129 rdf:type schema:DefinedTerm
130 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
131 schema:name Psychology and Cognitive Sciences
132 rdf:type schema:DefinedTerm
133 anzsrc-for:1701 schema:inDefinedTermSet anzsrc-for:
134 schema:name Psychology
135 rdf:type schema:DefinedTerm
136 sg:journal.1045266 schema:issn 0932-8092
137 1432-1769
138 schema:name Machine Vision and Applications
139 schema:publisher Springer Nature
140 rdf:type schema:Periodical
141 sg:person.010025676433.79 schema:affiliation grid-institutes:None
142 schema:familyName Madi
143 schema:givenName Mohamed Ridha
144 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010025676433.79
145 rdf:type schema:Person
146 sg:person.010106534777.71 schema:affiliation grid-institutes:None
147 schema:familyName Amamra
148 schema:givenName Abdenour
149 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010106534777.71
150 rdf:type schema:Person
151 sg:person.011341300517.89 schema:affiliation grid-institutes:None
152 schema:familyName Boulahia
153 schema:givenName Said Yacine
154 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011341300517.89
155 rdf:type schema:Person
156 sg:person.016530747633.01 schema:affiliation grid-institutes:None
157 schema:familyName Daikh
158 schema:givenName Said
159 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016530747633.01
160 rdf:type schema:Person
161 sg:pub.10.1007/978-3-030-01234-2_21 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107454584
162 https://doi.org/10.1007/978-3-030-01234-2_21
163 rdf:type schema:CreativeWork
164 sg:pub.10.1007/978-3-030-58545-7_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1132320915
165 https://doi.org/10.1007/978-3-030-58545-7_5
166 rdf:type schema:CreativeWork
167 sg:pub.10.1007/978-3-030-64556-4_23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1133436420
168 https://doi.org/10.1007/978-3-030-64556-4_23
169 rdf:type schema:CreativeWork
170 sg:pub.10.1007/978-3-030-69418-0_3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1135478128
171 https://doi.org/10.1007/978-3-030-69418-0_3
172 rdf:type schema:CreativeWork
173 sg:pub.10.1007/s10489-020-01823-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1130328966
174 https://doi.org/10.1007/s10489-020-01823-z
175 rdf:type schema:CreativeWork
176 sg:pub.10.1007/s11042-021-11058-w schema:sameAs https://app.dimensions.ai/details/publication/pub.1138865569
177 https://doi.org/10.1007/s11042-021-11058-w
178 rdf:type schema:CreativeWork
179 sg:pub.10.1007/s12652-019-01239-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1112088455
180 https://doi.org/10.1007/s12652-019-01239-9
181 rdf:type schema:CreativeWork
182 grid-institutes:None schema:alternateName Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria
183 schema:name Ecole Militaire Polytechnique, BP 17, Bordj el Bahri 16111, Algiers, Algeria
184 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...