A Novel Experience-Based Exploration Method for Q-Learning View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2018-09-09

AUTHORS

Bohong Yang , Hong Lu , Baogen Li , Zheng Zhang , Wenqiang Zhang

ABSTRACT

Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method—Experience-based Exploration method—in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld. More... »

PAGES

225-240

References to SciGraph publications

Book

TITLE

Data Science

ISBN

978-981-13-2202-0
978-981-13-2203-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17

DOI

http://dx.doi.org/10.1007/978-981-13-2203-7_17

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1106916269


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yang", 
        "givenName": "Bohong", 
        "id": "sg:person.07565565401.60", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07565565401.60"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lu", 
        "givenName": "Hong", 
        "id": "sg:person.013576203375.62", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576203375.62"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Li", 
        "givenName": "Baogen", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "New York University Shanghai", 
          "id": "https://www.grid.ac/institutes/grid.449457.f", 
          "name": [
            "School of Computer Science, New York University Shanghai, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zhang", 
        "givenName": "Zheng", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zhang", 
        "givenName": "Wenqiang", 
        "id": "sg:person.010531241272.46", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531241272.46"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nature14236", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030517994", 
          "https://doi.org/10.1038/nature14236"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00992698", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033088958", 
          "https://doi.org/10.1007/bf00992698"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature16961", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039427823", 
          "https://doi.org/10.1038/nature16961"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2647868.2654889", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052031051"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2327636", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718600"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2371046", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718712"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2376703", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718718"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2015.2403394", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718799"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2016.2522401", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061719118"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-09-09", 
    "datePublishedReg": "2018-09-09", 
    "description": "Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method\u2014Experience-based Exploration method\u2014in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld.", 
    "editor": [
      {
        "familyName": "Zhou", 
        "givenName": "Qinglei", 
        "type": "Person"
      }, 
      {
        "familyName": "Gan", 
        "givenName": "Yong", 
        "type": "Person"
      }, 
      {
        "familyName": "Jing", 
        "givenName": "Weipeng", 
        "type": "Person"
      }, 
      {
        "familyName": "Song", 
        "givenName": "Xianhua", 
        "type": "Person"
      }, 
      {
        "familyName": "Wang", 
        "givenName": "Yan", 
        "type": "Person"
      }, 
      {
        "familyName": "Lu", 
        "givenName": "Zeguang", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-981-13-2203-7_17", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-981-13-2202-0", 
        "978-981-13-2203-7"
      ], 
      "name": "Data Science", 
      "type": "Book"
    }, 
    "name": "A Novel Experience-Based Exploration Method for Q-Learning", 
    "pagination": "225-240", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-981-13-2203-7_17"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "a63807298cce667fc3371365bc437f6a43ef175eb6f0250cb44b755ae034e032"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1106916269"
        ]
      }
    ], 
    "publisher": {
      "location": "Singapore", 
      "name": "Springer Singapore", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-981-13-2203-7_17", 
      "https://app.dimensions.ai/details/publication/pub.1106916269"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T04:41", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000322_0000000322/records_65017_00000000.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-981-13-2203-7_17"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'


 

This table displays all metadata directly associated to this object as RDF triples.

150 TRIPLES      23 PREDICATES      35 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-981-13-2203-7_17 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N0c8ef0062a034a7ca039895808538ac0
4 schema:citation sg:pub.10.1007/bf00992698
5 sg:pub.10.1038/nature14236
6 sg:pub.10.1038/nature16961
7 https://doi.org/10.1109/tnnls.2014.2327636
8 https://doi.org/10.1109/tnnls.2014.2371046
9 https://doi.org/10.1109/tnnls.2014.2376703
10 https://doi.org/10.1109/tnnls.2015.2403394
11 https://doi.org/10.1109/tnnls.2016.2522401
12 https://doi.org/10.1145/2647868.2654889
13 schema:datePublished 2018-09-09
14 schema:datePublishedReg 2018-09-09
15 schema:description Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method—Experience-based Exploration method—in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld.
16 schema:editor N50861de0ccfe4a3a8a302e1576e09fb9
17 schema:genre chapter
18 schema:inLanguage en
19 schema:isAccessibleForFree false
20 schema:isPartOf N0450f25a6aac4f13a09c72264b79bcaa
21 schema:name A Novel Experience-Based Exploration Method for Q-Learning
22 schema:pagination 225-240
23 schema:productId N05dea3b7ecc14da7890c57183b52fadb
24 N1f71240de40b487ea7edeb104b702336
25 N7f3e1eb95f14408baf058ae9e694b89a
26 schema:publisher N2664b22ca2354f5a826c81051b4b8830
27 schema:sameAs https://app.dimensions.ai/details/publication/pub.1106916269
28 https://doi.org/10.1007/978-981-13-2203-7_17
29 schema:sdDatePublished 2019-04-16T04:41
30 schema:sdLicense https://scigraph.springernature.com/explorer/license/
31 schema:sdPublisher N601f434896d04804be767392c795936a
32 schema:url https://link.springer.com/10.1007%2F978-981-13-2203-7_17
33 sgo:license sg:explorer/license/
34 sgo:sdDataset chapters
35 rdf:type schema:Chapter
36 N04016b61fd764f1a995c21dd3666fd05 schema:familyName Song
37 schema:givenName Xianhua
38 rdf:type schema:Person
39 N0450f25a6aac4f13a09c72264b79bcaa schema:isbn 978-981-13-2202-0
40 978-981-13-2203-7
41 schema:name Data Science
42 rdf:type schema:Book
43 N05360ab45dfe416a83c2f90662b27f19 rdf:first N04016b61fd764f1a995c21dd3666fd05
44 rdf:rest N5abf0c4b69c14ab789a3b67c462494c7
45 N05dea3b7ecc14da7890c57183b52fadb schema:name readcube_id
46 schema:value a63807298cce667fc3371365bc437f6a43ef175eb6f0250cb44b755ae034e032
47 rdf:type schema:PropertyValue
48 N0c8ef0062a034a7ca039895808538ac0 rdf:first sg:person.07565565401.60
49 rdf:rest N5ba204f6ce894c4cb2cd5a2c13837795
50 N1d8c1014fdfc40ddb8e40a56f2854832 schema:familyName Wang
51 schema:givenName Yan
52 rdf:type schema:Person
53 N1f71240de40b487ea7edeb104b702336 schema:name doi
54 schema:value 10.1007/978-981-13-2203-7_17
55 rdf:type schema:PropertyValue
56 N2664b22ca2354f5a826c81051b4b8830 schema:location Singapore
57 schema:name Springer Singapore
58 rdf:type schema:Organisation
59 N2b64714bb2c7470b9e47d413313ea0cc schema:familyName Lu
60 schema:givenName Zeguang
61 rdf:type schema:Person
62 N341fdfc1b3db42a2a1c9518798a60962 rdf:first sg:person.010531241272.46
63 rdf:rest rdf:nil
64 N45f84a2536b04638a4ead29365da0f38 rdf:first Nc43cb119f7954c508aee8354eecc6456
65 rdf:rest Ncbabaffbd8904ccb8daca07395ee9190
66 N49ee22a9d67941b5bb3b18d6dc7a96f4 rdf:first N72e4afad4b954d52917449fed4bc0227
67 rdf:rest N341fdfc1b3db42a2a1c9518798a60962
68 N4d8035706d6f4a299c5226ca084374eb schema:affiliation https://www.grid.ac/institutes/grid.8547.e
69 schema:familyName Li
70 schema:givenName Baogen
71 rdf:type schema:Person
72 N50861de0ccfe4a3a8a302e1576e09fb9 rdf:first N80f08b3d9dff465baeeb8d5d15615ac6
73 rdf:rest N45f84a2536b04638a4ead29365da0f38
74 N5abf0c4b69c14ab789a3b67c462494c7 rdf:first N1d8c1014fdfc40ddb8e40a56f2854832
75 rdf:rest Nb72503aa21e544baaee7cf846a3ea987
76 N5ba204f6ce894c4cb2cd5a2c13837795 rdf:first sg:person.013576203375.62
77 rdf:rest Ne01b1340d5e8486bbaaeeee54894b297
78 N601f434896d04804be767392c795936a schema:name Springer Nature - SN SciGraph project
79 rdf:type schema:Organization
80 N72e4afad4b954d52917449fed4bc0227 schema:affiliation https://www.grid.ac/institutes/grid.449457.f
81 schema:familyName Zhang
82 schema:givenName Zheng
83 rdf:type schema:Person
84 N7f3e1eb95f14408baf058ae9e694b89a schema:name dimensions_id
85 schema:value pub.1106916269
86 rdf:type schema:PropertyValue
87 N80f08b3d9dff465baeeb8d5d15615ac6 schema:familyName Zhou
88 schema:givenName Qinglei
89 rdf:type schema:Person
90 N98239feffc664e31a6d03c8f2443ba30 schema:familyName Jing
91 schema:givenName Weipeng
92 rdf:type schema:Person
93 Nb72503aa21e544baaee7cf846a3ea987 rdf:first N2b64714bb2c7470b9e47d413313ea0cc
94 rdf:rest rdf:nil
95 Nc43cb119f7954c508aee8354eecc6456 schema:familyName Gan
96 schema:givenName Yong
97 rdf:type schema:Person
98 Ncbabaffbd8904ccb8daca07395ee9190 rdf:first N98239feffc664e31a6d03c8f2443ba30
99 rdf:rest N05360ab45dfe416a83c2f90662b27f19
100 Ne01b1340d5e8486bbaaeeee54894b297 rdf:first N4d8035706d6f4a299c5226ca084374eb
101 rdf:rest N49ee22a9d67941b5bb3b18d6dc7a96f4
102 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
103 schema:name Information and Computing Sciences
104 rdf:type schema:DefinedTerm
105 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
106 schema:name Artificial Intelligence and Image Processing
107 rdf:type schema:DefinedTerm
108 sg:person.010531241272.46 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
109 schema:familyName Zhang
110 schema:givenName Wenqiang
111 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531241272.46
112 rdf:type schema:Person
113 sg:person.013576203375.62 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
114 schema:familyName Lu
115 schema:givenName Hong
116 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576203375.62
117 rdf:type schema:Person
118 sg:person.07565565401.60 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
119 schema:familyName Yang
120 schema:givenName Bohong
121 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07565565401.60
122 rdf:type schema:Person
123 sg:pub.10.1007/bf00992698 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033088958
124 https://doi.org/10.1007/bf00992698
125 rdf:type schema:CreativeWork
126 sg:pub.10.1038/nature14236 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030517994
127 https://doi.org/10.1038/nature14236
128 rdf:type schema:CreativeWork
129 sg:pub.10.1038/nature16961 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039427823
130 https://doi.org/10.1038/nature16961
131 rdf:type schema:CreativeWork
132 https://doi.org/10.1109/tnnls.2014.2327636 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718600
133 rdf:type schema:CreativeWork
134 https://doi.org/10.1109/tnnls.2014.2371046 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718712
135 rdf:type schema:CreativeWork
136 https://doi.org/10.1109/tnnls.2014.2376703 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718718
137 rdf:type schema:CreativeWork
138 https://doi.org/10.1109/tnnls.2015.2403394 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718799
139 rdf:type schema:CreativeWork
140 https://doi.org/10.1109/tnnls.2016.2522401 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061719118
141 rdf:type schema:CreativeWork
142 https://doi.org/10.1145/2647868.2654889 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052031051
143 rdf:type schema:CreativeWork
144 https://www.grid.ac/institutes/grid.449457.f schema:alternateName New York University Shanghai
145 schema:name School of Computer Science, New York University Shanghai, Shanghai, People’s Republic of China
146 rdf:type schema:Organization
147 https://www.grid.ac/institutes/grid.8547.e schema:alternateName Fudan University
148 schema:name Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People’s Republic of China
149 Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People’s Republic of China
150 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...