Acceleration of Game Learning with Prediction-Based Reinforcement Learning — Toward the Emergence of Planning Behavior — View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2003-06-18

AUTHORS

Yu Ohigashi , Takashi Omori , Koji Morikawa , Natsuki Oka

ABSTRACT

When humans solve a problem, it is unlikely that they use only the current state of the problem to decide upon an action. It is difficult to explain the human action decision strategy by means of the state to action model, which is the major method used in conventional reinforcement learning (RL). On the contrary, humans appear to predict a future state through the use of past experience and decide upon an action based on that predicted state. In this paper, we propose a predictionbased RL model (PRLmodel). In the PRL model, a state prediction module and an action memory module are added to an actor-critic type RL, and the system predicts and evaluates a future state from a current one based on an expected value table. Then, the system chooses a point of action decision in order to perform the appropriate action. To evaluate the proposed model, we perform a computer simulation using a simple ping pong game. We also discuss the possibility that the PRL model may represent an evolutional change in conventional RL as well as a step toward modeling of hmuan planning behavior, because state prediction and its evaluation are the basic elements of planning in symbolic AI. More... »

PAGES

786-793

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-44989-2_94

DOI

http://dx.doi.org/10.1007/3-540-44989-2_94

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1032984622


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology and Cognitive Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1701", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Graduate School of Engineering, Hokkaido University, Kita 13 jyou Nishi 8 chome, Kita, Sapporo, 060-8628, Hokkaido, Japan", 
          "id": "http://www.grid.ac/institutes/grid.39158.36", 
          "name": [
            "Graduate School of Engineering, Hokkaido University, Kita 13 jyou Nishi 8 chome, Kita, Sapporo, 060-8628, Hokkaido, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ohigashi", 
        "givenName": "Yu", 
        "id": "sg:person.01046424175.43", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01046424175.43"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Graduate School of Engineering, Hokkaido University, Kita 13 jyou Nishi 8 chome, Kita, Sapporo, 060-8628, Hokkaido, Japan", 
          "id": "http://www.grid.ac/institutes/grid.39158.36", 
          "name": [
            "Graduate School of Engineering, Hokkaido University, Kita 13 jyou Nishi 8 chome, Kita, Sapporo, 060-8628, Hokkaido, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Omori", 
        "givenName": "Takashi", 
        "id": "sg:person.01263557346.07", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01263557346.07"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Humanware Technology Research Laboratory, Matsushita Electric Industrial Co., Ltd., 3-4, Hikaridai, Seika, Soraku, 619-0237, Kyoto, Japan", 
          "id": "http://www.grid.ac/institutes/grid.410834.a", 
          "name": [
            "Humanware Technology Research Laboratory, Matsushita Electric Industrial Co., Ltd., 3-4, Hikaridai, Seika, Soraku, 619-0237, Kyoto, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Morikawa", 
        "givenName": "Koji", 
        "id": "sg:person.01034077773.41", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01034077773.41"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Humanware Technology Research Laboratory, Matsushita Electric Industrial Co., Ltd., 3-4, Hikaridai, Seika, Soraku, 619-0237, Kyoto, Japan", 
          "id": "http://www.grid.ac/institutes/grid.410834.a", 
          "name": [
            "Humanware Technology Research Laboratory, Matsushita Electric Industrial Co., Ltd., 3-4, Hikaridai, Seika, Soraku, 619-0237, Kyoto, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Oka", 
        "givenName": "Natsuki", 
        "id": "sg:person.010775400155.23", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010775400155.23"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2003-06-18", 
    "datePublishedReg": "2003-06-18", 
    "description": "When humans solve a problem, it is unlikely that they use only the current state of the problem to decide upon an action. It is difficult to explain the human action decision strategy by means of the state to action model, which is the major method used in conventional reinforcement learning (RL). On the contrary, humans appear to predict a future state through the use of past experience and decide upon an action based on that predicted state. In this paper, we propose a predictionbased RL model (PRLmodel). In the PRL model, a state prediction module and an action memory module are added to an actor-critic type RL, and the system predicts and evaluates a future state from a current one based on an expected value table. Then, the system chooses a point of action decision in order to perform the appropriate action. To evaluate the proposed model, we perform a computer simulation using a simple ping pong game. We also discuss the possibility that the PRL model may represent an evolutional change in conventional RL as well as a step toward modeling of hmuan planning behavior, because state prediction and its evaluation are the basic elements of planning in symbolic AI.", 
    "editor": [
      {
        "familyName": "Kaynak", 
        "givenName": "Okyay", 
        "type": "Person"
      }, 
      {
        "familyName": "Alpaydin", 
        "givenName": "Ethem", 
        "type": "Person"
      }, 
      {
        "familyName": "Oja", 
        "givenName": "Erkki", 
        "type": "Person"
      }, 
      {
        "familyName": "Xu", 
        "givenName": "Lei", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-44989-2_94", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-40408-8", 
        "978-3-540-44989-8"
      ], 
      "name": "Artificial Neural Networks and Neural Information Processing \u2014 ICANN/ICONIP 2003", 
      "type": "Book"
    }, 
    "keywords": [
      "reinforcement learning", 
      "action decision strategy", 
      "planning behavior", 
      "conventional reinforcement learning", 
      "RL model", 
      "action decisions", 
      "Pong game", 
      "game learning", 
      "decision strategies", 
      "symbolic AI", 
      "past experience", 
      "action model", 
      "learning", 
      "appropriate action", 
      "future state", 
      "memory modules", 
      "behavior", 
      "humans", 
      "basic elements", 
      "experience", 
      "game", 
      "action", 
      "evolutional changes", 
      "current state", 
      "state prediction", 
      "decisions", 
      "model", 
      "prediction", 
      "problem", 
      "strategies", 
      "AI", 
      "state", 
      "prediction module", 
      "current one", 
      "emergence", 
      "modeling", 
      "planning", 
      "value table", 
      "module", 
      "use", 
      "changes", 
      "one", 
      "evaluation", 
      "possibility", 
      "system", 
      "point", 
      "means", 
      "major methods", 
      "paper", 
      "order", 
      "elements", 
      "contrary", 
      "computer simulation", 
      "step", 
      "method", 
      "table", 
      "acceleration", 
      "simulations", 
      "human action decision strategy", 
      "predictionbased RL model", 
      "PRL model", 
      "state prediction module", 
      "action memory module", 
      "actor-critic type RL", 
      "type RL", 
      "simple ping pong game", 
      "ping pong game", 
      "hmuan planning behavior"
    ], 
    "name": "Acceleration of Game Learning with Prediction-Based Reinforcement Learning \u2014 Toward the Emergence of Planning Behavior \u2014", 
    "pagination": "786-793", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1032984622"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-44989-2_94"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-44989-2_94", 
      "https://app.dimensions.ai/details/publication/pub.1032984622"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-01-01T19:15", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/chapter/chapter_27.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/3-540-44989-2_94"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-44989-2_94'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-44989-2_94'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-44989-2_94'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-44989-2_94'


 

This table displays all metadata directly associated to this object as RDF triples.

167 TRIPLES      23 PREDICATES      93 URIs      86 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-44989-2_94 schema:about anzsrc-for:17
2 anzsrc-for:1701
3 schema:author Nd474ddc791e044e88d9b60d995ca5a48
4 schema:datePublished 2003-06-18
5 schema:datePublishedReg 2003-06-18
6 schema:description When humans solve a problem, it is unlikely that they use only the current state of the problem to decide upon an action. It is difficult to explain the human action decision strategy by means of the state to action model, which is the major method used in conventional reinforcement learning (RL). On the contrary, humans appear to predict a future state through the use of past experience and decide upon an action based on that predicted state. In this paper, we propose a predictionbased RL model (PRLmodel). In the PRL model, a state prediction module and an action memory module are added to an actor-critic type RL, and the system predicts and evaluates a future state from a current one based on an expected value table. Then, the system chooses a point of action decision in order to perform the appropriate action. To evaluate the proposed model, we perform a computer simulation using a simple ping pong game. We also discuss the possibility that the PRL model may represent an evolutional change in conventional RL as well as a step toward modeling of hmuan planning behavior, because state prediction and its evaluation are the basic elements of planning in symbolic AI.
7 schema:editor N3e2be2fdbd2f42a7876cd713f2b6db5a
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf N726c8373c7b94173b0120b96ad82580e
12 schema:keywords AI
13 PRL model
14 Pong game
15 RL model
16 acceleration
17 action
18 action decision strategy
19 action decisions
20 action memory module
21 action model
22 actor-critic type RL
23 appropriate action
24 basic elements
25 behavior
26 changes
27 computer simulation
28 contrary
29 conventional reinforcement learning
30 current one
31 current state
32 decision strategies
33 decisions
34 elements
35 emergence
36 evaluation
37 evolutional changes
38 experience
39 future state
40 game
41 game learning
42 hmuan planning behavior
43 human action decision strategy
44 humans
45 learning
46 major methods
47 means
48 memory modules
49 method
50 model
51 modeling
52 module
53 one
54 order
55 paper
56 past experience
57 ping pong game
58 planning
59 planning behavior
60 point
61 possibility
62 prediction
63 prediction module
64 predictionbased RL model
65 problem
66 reinforcement learning
67 simple ping pong game
68 simulations
69 state
70 state prediction
71 state prediction module
72 step
73 strategies
74 symbolic AI
75 system
76 table
77 type RL
78 use
79 value table
80 schema:name Acceleration of Game Learning with Prediction-Based Reinforcement Learning — Toward the Emergence of Planning Behavior —
81 schema:pagination 786-793
82 schema:productId N3909400ec2a74247b2df75f7d4860949
83 Nff42dd99d67e49f68cd9d520717b326b
84 schema:publisher Nc4f9b4ec12774cdfbb990ca27d883721
85 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032984622
86 https://doi.org/10.1007/3-540-44989-2_94
87 schema:sdDatePublished 2022-01-01T19:15
88 schema:sdLicense https://scigraph.springernature.com/explorer/license/
89 schema:sdPublisher N02cd5a87a58444b0b4cf530b412f0afe
90 schema:url https://doi.org/10.1007/3-540-44989-2_94
91 sgo:license sg:explorer/license/
92 sgo:sdDataset chapters
93 rdf:type schema:Chapter
94 N02cd5a87a58444b0b4cf530b412f0afe schema:name Springer Nature - SN SciGraph project
95 rdf:type schema:Organization
96 N0d9cb7f1c6f54ea4b848a0f13846b948 rdf:first Nca2466913cd24c5cae39c3d246db3892
97 rdf:rest rdf:nil
98 N31eeb204bfb84508843f210fdd813565 schema:familyName Alpaydin
99 schema:givenName Ethem
100 rdf:type schema:Person
101 N3909400ec2a74247b2df75f7d4860949 schema:name dimensions_id
102 schema:value pub.1032984622
103 rdf:type schema:PropertyValue
104 N3af04c8899474ed08f70a488f4781066 rdf:first sg:person.01034077773.41
105 rdf:rest N3fbbe257816042c88d36d918dccfef9d
106 N3da66c3606504dcd81567a30e9675a5f rdf:first N676e1b7e31e14e65812ce527a69964e1
107 rdf:rest N0d9cb7f1c6f54ea4b848a0f13846b948
108 N3e2be2fdbd2f42a7876cd713f2b6db5a rdf:first Nb48f0d20a68c49fcbe4daf467ad62a52
109 rdf:rest Nf778a97b8ff14f6fa922c0e0467838b7
110 N3fbbe257816042c88d36d918dccfef9d rdf:first sg:person.010775400155.23
111 rdf:rest rdf:nil
112 N676e1b7e31e14e65812ce527a69964e1 schema:familyName Oja
113 schema:givenName Erkki
114 rdf:type schema:Person
115 N726c8373c7b94173b0120b96ad82580e schema:isbn 978-3-540-40408-8
116 978-3-540-44989-8
117 schema:name Artificial Neural Networks and Neural Information Processing — ICANN/ICONIP 2003
118 rdf:type schema:Book
119 N800814a197364eaaa38c5f7fc1865699 rdf:first sg:person.01263557346.07
120 rdf:rest N3af04c8899474ed08f70a488f4781066
121 Nb48f0d20a68c49fcbe4daf467ad62a52 schema:familyName Kaynak
122 schema:givenName Okyay
123 rdf:type schema:Person
124 Nc4f9b4ec12774cdfbb990ca27d883721 schema:name Springer Nature
125 rdf:type schema:Organisation
126 Nca2466913cd24c5cae39c3d246db3892 schema:familyName Xu
127 schema:givenName Lei
128 rdf:type schema:Person
129 Nd474ddc791e044e88d9b60d995ca5a48 rdf:first sg:person.01046424175.43
130 rdf:rest N800814a197364eaaa38c5f7fc1865699
131 Nf778a97b8ff14f6fa922c0e0467838b7 rdf:first N31eeb204bfb84508843f210fdd813565
132 rdf:rest N3da66c3606504dcd81567a30e9675a5f
133 Nff42dd99d67e49f68cd9d520717b326b schema:name doi
134 schema:value 10.1007/3-540-44989-2_94
135 rdf:type schema:PropertyValue
136 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
137 schema:name Psychology and Cognitive Sciences
138 rdf:type schema:DefinedTerm
139 anzsrc-for:1701 schema:inDefinedTermSet anzsrc-for:
140 schema:name Psychology
141 rdf:type schema:DefinedTerm
142 sg:person.01034077773.41 schema:affiliation grid-institutes:grid.410834.a
143 schema:familyName Morikawa
144 schema:givenName Koji
145 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01034077773.41
146 rdf:type schema:Person
147 sg:person.01046424175.43 schema:affiliation grid-institutes:grid.39158.36
148 schema:familyName Ohigashi
149 schema:givenName Yu
150 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01046424175.43
151 rdf:type schema:Person
152 sg:person.010775400155.23 schema:affiliation grid-institutes:grid.410834.a
153 schema:familyName Oka
154 schema:givenName Natsuki
155 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010775400155.23
156 rdf:type schema:Person
157 sg:person.01263557346.07 schema:affiliation grid-institutes:grid.39158.36
158 schema:familyName Omori
159 schema:givenName Takashi
160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01263557346.07
161 rdf:type schema:Person
162 grid-institutes:grid.39158.36 schema:alternateName Graduate School of Engineering, Hokkaido University, Kita 13 jyou Nishi 8 chome, Kita, Sapporo, 060-8628, Hokkaido, Japan
163 schema:name Graduate School of Engineering, Hokkaido University, Kita 13 jyou Nishi 8 chome, Kita, Sapporo, 060-8628, Hokkaido, Japan
164 rdf:type schema:Organization
165 grid-institutes:grid.410834.a schema:alternateName Humanware Technology Research Laboratory, Matsushita Electric Industrial Co., Ltd., 3-4, Hikaridai, Seika, Soraku, 619-0237, Kyoto, Japan
166 schema:name Humanware Technology Research Laboratory, Matsushita Electric Industrial Co., Ltd., 3-4, Hikaridai, Seika, Soraku, 619-0237, Kyoto, Japan
167 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...