Modular Reinforcement Learning: An Application to a Real Robot Task View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2000-06-09

AUTHORS

Zsolt Kalmár , Csaba Szepesvári , András Lorincz

ABSTRACT

The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, “approximately” Markovian task, which is completely observable too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the “module-level” that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future. More... »

PAGES

29-45

References to SciGraph publications

Book

TITLE

Learning Robots

ISBN

978-3-540-65480-3
978-3-540-49240-5

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-49240-2_3

DOI

http://dx.doi.org/10.1007/3-540-49240-2_3

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1029325683


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "name": [
            "Dept. of Informatics JATE, Aradi vrt. tere 1, H-6720, Szeged, Hungary"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kalm\u00e1r", 
        "givenName": "Zsolt", 
        "id": "sg:person.010745541621.05", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010745541621.05"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Research Group on Art. Int., JATE, Aradi vrt. tere 1, H-6720, Szeged, Hungary"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Szepesv\u00e1ri", 
        "givenName": "Csaba", 
        "id": "sg:person.016202177221.23", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Dept. of Chemical Physics, Inst. of Isotopes, HAS, P.O. Box 77, H-1525, Budapest, Hungary"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lorincz", 
        "givenName": "Andr\u00e1s", 
        "id": "sg:person.0651500301.38", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0651500301.38"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/0004-3702(92)90058-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009445883"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0004-3702(92)90058-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009445883"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1007440607681", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010556066", 
          "https://doi.org/10.1023/a:1007440607681"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/105971239300200202", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011632542"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/105971239300200202", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011632542"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00117447", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016428696", 
          "https://doi.org/10.1007/bf00117447"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00117447", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016428696", 
          "https://doi.org/10.1007/bf00117447"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00992698", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033088958", 
          "https://doi.org/10.1007/bf00992698"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1162/neco.1994.6.6.1185", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037933600"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/b978-1-55860-377-6.50052-9", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040838523"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0004-3702(94)00011-o", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043559588"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00114724", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051613972", 
          "https://doi.org/10.1007/bf00114724"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00114724", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051613972", 
          "https://doi.org/10.1007/bf00114724"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icnn.1994.374432", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094331265"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/iros.1996.568989", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095624088"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2000-06-09", 
    "datePublishedReg": "2000-06-09", 
    "description": "The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, \u201capproximately\u201d Markovian task, which is completely observable too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the \u201cmodule-level\u201d that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future.", 
    "editor": [
      {
        "familyName": "Birk", 
        "givenName": "Andreas", 
        "type": "Person"
      }, 
      {
        "familyName": "Demiris", 
        "givenName": "John", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-49240-2_3", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-65480-3", 
        "978-3-540-49240-5"
      ], 
      "name": "Learning Robots", 
      "type": "Book"
    }, 
    "name": "Modular Reinforcement Learning: An Application to a Real Robot Task", 
    "pagination": "29-45", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-49240-2_3"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "8bb3b2f9a98968be42c5c45a8e88b378d3401106f660111608a06603641cd076"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1029325683"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-49240-2_3", 
      "https://app.dimensions.ai/details/publication/pub.1029325683"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T05:39", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000346_0000000346/records_99843_00000002.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F3-540-49240-2_3"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-49240-2_3'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-49240-2_3'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-49240-2_3'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-49240-2_3'


 

This table displays all metadata directly associated to this object as RDF triples.

124 TRIPLES      23 PREDICATES      37 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-49240-2_3 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N7bb3d07644b548dbab9f53282987fc1b
4 schema:citation sg:pub.10.1007/bf00114724
5 sg:pub.10.1007/bf00117447
6 sg:pub.10.1007/bf00992698
7 sg:pub.10.1023/a:1007440607681
8 https://doi.org/10.1016/0004-3702(92)90058-6
9 https://doi.org/10.1016/0004-3702(94)00011-o
10 https://doi.org/10.1016/b978-1-55860-377-6.50052-9
11 https://doi.org/10.1109/icnn.1994.374432
12 https://doi.org/10.1109/iros.1996.568989
13 https://doi.org/10.1162/neco.1994.6.6.1185
14 https://doi.org/10.1177/105971239300200202
15 schema:datePublished 2000-06-09
16 schema:datePublishedReg 2000-06-09
17 schema:description The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, “approximately” Markovian task, which is completely observable too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the “module-level” that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future.
18 schema:editor N0c846c175ee84847a88a87a91151aaf2
19 schema:genre chapter
20 schema:inLanguage en
21 schema:isAccessibleForFree false
22 schema:isPartOf N49682b6888464b898053a1ec75e9880b
23 schema:name Modular Reinforcement Learning: An Application to a Real Robot Task
24 schema:pagination 29-45
25 schema:productId N2748f165f3c94ec28e875655258e1358
26 N616ad2ac9fda4441836f5be074f7034d
27 Nbec98f8f69704fd688e9b170a7dc6df8
28 schema:publisher Ndefa51e4d9e145e6be5df39fc984ccac
29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029325683
30 https://doi.org/10.1007/3-540-49240-2_3
31 schema:sdDatePublished 2019-04-16T05:39
32 schema:sdLicense https://scigraph.springernature.com/explorer/license/
33 schema:sdPublisher N837fd42dfc774867b9c8334d2e2799ce
34 schema:url https://link.springer.com/10.1007%2F3-540-49240-2_3
35 sgo:license sg:explorer/license/
36 sgo:sdDataset chapters
37 rdf:type schema:Chapter
38 N0c846c175ee84847a88a87a91151aaf2 rdf:first N4640891354f544af8adf57d1c0a5b837
39 rdf:rest Nf1b943428cec449c8c9bae3c25126759
40 N2748f165f3c94ec28e875655258e1358 schema:name doi
41 schema:value 10.1007/3-540-49240-2_3
42 rdf:type schema:PropertyValue
43 N439b9cff6cd84acd80f60b2686d8346a schema:name Research Group on Art. Int., JATE, Aradi vrt. tere 1, H-6720, Szeged, Hungary
44 rdf:type schema:Organization
45 N4640891354f544af8adf57d1c0a5b837 schema:familyName Birk
46 schema:givenName Andreas
47 rdf:type schema:Person
48 N49682b6888464b898053a1ec75e9880b schema:isbn 978-3-540-49240-5
49 978-3-540-65480-3
50 schema:name Learning Robots
51 rdf:type schema:Book
52 N616ad2ac9fda4441836f5be074f7034d schema:name readcube_id
53 schema:value 8bb3b2f9a98968be42c5c45a8e88b378d3401106f660111608a06603641cd076
54 rdf:type schema:PropertyValue
55 N62a46acfd28048909ca81c613481ea38 schema:name Dept. of Informatics JATE, Aradi vrt. tere 1, H-6720, Szeged, Hungary
56 rdf:type schema:Organization
57 N7adfac3749da4e019f8d4e25def614dd schema:familyName Demiris
58 schema:givenName John
59 rdf:type schema:Person
60 N7bb3d07644b548dbab9f53282987fc1b rdf:first sg:person.010745541621.05
61 rdf:rest N8570528d374844d0acb35b784c3417b5
62 N837fd42dfc774867b9c8334d2e2799ce schema:name Springer Nature - SN SciGraph project
63 rdf:type schema:Organization
64 N8570528d374844d0acb35b784c3417b5 rdf:first sg:person.016202177221.23
65 rdf:rest Nce72862444a64c7c947e32ace8874a76
66 N93102fc7833c4fd19ae1603762c01085 schema:name Dept. of Chemical Physics, Inst. of Isotopes, HAS, P.O. Box 77, H-1525, Budapest, Hungary
67 rdf:type schema:Organization
68 Nbec98f8f69704fd688e9b170a7dc6df8 schema:name dimensions_id
69 schema:value pub.1029325683
70 rdf:type schema:PropertyValue
71 Nce72862444a64c7c947e32ace8874a76 rdf:first sg:person.0651500301.38
72 rdf:rest rdf:nil
73 Ndefa51e4d9e145e6be5df39fc984ccac schema:location Berlin, Heidelberg
74 schema:name Springer Berlin Heidelberg
75 rdf:type schema:Organisation
76 Nf1b943428cec449c8c9bae3c25126759 rdf:first N7adfac3749da4e019f8d4e25def614dd
77 rdf:rest rdf:nil
78 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
79 schema:name Information and Computing Sciences
80 rdf:type schema:DefinedTerm
81 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
82 schema:name Artificial Intelligence and Image Processing
83 rdf:type schema:DefinedTerm
84 sg:person.010745541621.05 schema:affiliation N62a46acfd28048909ca81c613481ea38
85 schema:familyName Kalmár
86 schema:givenName Zsolt
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010745541621.05
88 rdf:type schema:Person
89 sg:person.016202177221.23 schema:affiliation N439b9cff6cd84acd80f60b2686d8346a
90 schema:familyName Szepesvári
91 schema:givenName Csaba
92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23
93 rdf:type schema:Person
94 sg:person.0651500301.38 schema:affiliation N93102fc7833c4fd19ae1603762c01085
95 schema:familyName Lorincz
96 schema:givenName András
97 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0651500301.38
98 rdf:type schema:Person
99 sg:pub.10.1007/bf00114724 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051613972
100 https://doi.org/10.1007/bf00114724
101 rdf:type schema:CreativeWork
102 sg:pub.10.1007/bf00117447 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016428696
103 https://doi.org/10.1007/bf00117447
104 rdf:type schema:CreativeWork
105 sg:pub.10.1007/bf00992698 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033088958
106 https://doi.org/10.1007/bf00992698
107 rdf:type schema:CreativeWork
108 sg:pub.10.1023/a:1007440607681 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010556066
109 https://doi.org/10.1023/a:1007440607681
110 rdf:type schema:CreativeWork
111 https://doi.org/10.1016/0004-3702(92)90058-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009445883
112 rdf:type schema:CreativeWork
113 https://doi.org/10.1016/0004-3702(94)00011-o schema:sameAs https://app.dimensions.ai/details/publication/pub.1043559588
114 rdf:type schema:CreativeWork
115 https://doi.org/10.1016/b978-1-55860-377-6.50052-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040838523
116 rdf:type schema:CreativeWork
117 https://doi.org/10.1109/icnn.1994.374432 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094331265
118 rdf:type schema:CreativeWork
119 https://doi.org/10.1109/iros.1996.568989 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095624088
120 rdf:type schema:CreativeWork
121 https://doi.org/10.1162/neco.1994.6.6.1185 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037933600
122 rdf:type schema:CreativeWork
123 https://doi.org/10.1177/105971239300200202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011632542
124 rdf:type schema:CreativeWork
 




Preview window. Press ESC to close (or click here)


...