A Novel Experience-Based Exploration Method for Q-Learning View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2018-09-09

AUTHORS

Bohong Yang , Hong Lu , Baogen Li , Zheng Zhang , Wenqiang Zhang

ABSTRACT

Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method—Experience-based Exploration method—in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld. More... »

PAGES

225-240

Book

TITLE

Data Science

ISBN

978-981-13-2202-0
978-981-13-2203-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17

DOI

http://dx.doi.org/10.1007/978-981-13-2203-7_17

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1106916269


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yang", 
        "givenName": "Bohong", 
        "id": "sg:person.07565565401.60", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07565565401.60"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lu", 
        "givenName": "Hong", 
        "id": "sg:person.013576203375.62", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576203375.62"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Li", 
        "givenName": "Baogen", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "New York University Shanghai", 
          "id": "https://www.grid.ac/institutes/grid.449457.f", 
          "name": [
            "School of Computer Science, New York University Shanghai, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zhang", 
        "givenName": "Zheng", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zhang", 
        "givenName": "Wenqiang", 
        "id": "sg:person.010531241272.46", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531241272.46"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nature14236", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030517994", 
          "https://doi.org/10.1038/nature14236"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00992698", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033088958", 
          "https://doi.org/10.1007/bf00992698"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature16961", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039427823", 
          "https://doi.org/10.1038/nature16961"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2647868.2654889", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052031051"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2327636", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718600"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2371046", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718712"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2376703", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718718"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2015.2403394", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718799"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2016.2522401", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061719118"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-09-09", 
    "datePublishedReg": "2018-09-09", 
    "description": "Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method\u2014Experience-based Exploration method\u2014in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld.", 
    "editor": [
      {
        "familyName": "Zhou", 
        "givenName": "Qinglei", 
        "type": "Person"
      }, 
      {
        "familyName": "Gan", 
        "givenName": "Yong", 
        "type": "Person"
      }, 
      {
        "familyName": "Jing", 
        "givenName": "Weipeng", 
        "type": "Person"
      }, 
      {
        "familyName": "Song", 
        "givenName": "Xianhua", 
        "type": "Person"
      }, 
      {
        "familyName": "Wang", 
        "givenName": "Yan", 
        "type": "Person"
      }, 
      {
        "familyName": "Lu", 
        "givenName": "Zeguang", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-981-13-2203-7_17", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-981-13-2202-0", 
        "978-981-13-2203-7"
      ], 
      "name": "Data Science", 
      "type": "Book"
    }, 
    "name": "A Novel Experience-Based Exploration Method for Q-Learning", 
    "pagination": "225-240", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-981-13-2203-7_17"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "a63807298cce667fc3371365bc437f6a43ef175eb6f0250cb44b755ae034e032"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1106916269"
        ]
      }
    ], 
    "publisher": {
      "location": "Singapore", 
      "name": "Springer Singapore", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-981-13-2203-7_17", 
      "https://app.dimensions.ai/details/publication/pub.1106916269"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T04:41", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000322_0000000322/records_65017_00000000.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-981-13-2203-7_17"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'


 

This table displays all metadata directly associated to this object as RDF triples.

150 TRIPLES      23 PREDICATES      35 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-981-13-2203-7_17 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Nb9b8cec89d2b4d59bd34f01d680c8324
4 schema:citation sg:pub.10.1007/bf00992698
5 sg:pub.10.1038/nature14236
6 sg:pub.10.1038/nature16961
7 https://doi.org/10.1109/tnnls.2014.2327636
8 https://doi.org/10.1109/tnnls.2014.2371046
9 https://doi.org/10.1109/tnnls.2014.2376703
10 https://doi.org/10.1109/tnnls.2015.2403394
11 https://doi.org/10.1109/tnnls.2016.2522401
12 https://doi.org/10.1145/2647868.2654889
13 schema:datePublished 2018-09-09
14 schema:datePublishedReg 2018-09-09
15 schema:description Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method—Experience-based Exploration method—in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld.
16 schema:editor Nbf95be69c47d4f78b00e2af97a2f9ba2
17 schema:genre chapter
18 schema:inLanguage en
19 schema:isAccessibleForFree false
20 schema:isPartOf Ndd67ec7f0e854efd9f815a448126dad0
21 schema:name A Novel Experience-Based Exploration Method for Q-Learning
22 schema:pagination 225-240
23 schema:productId N0af569b2f66d455e860d3663899c5530
24 N86e361d2f1764301844dd0fdc5ac7133
25 Nfcb7df4d718b4864942e88ea73c328c1
26 schema:publisher Nc9f5229bdb4f4ff28296fad9f9e71c22
27 schema:sameAs https://app.dimensions.ai/details/publication/pub.1106916269
28 https://doi.org/10.1007/978-981-13-2203-7_17
29 schema:sdDatePublished 2019-04-16T04:41
30 schema:sdLicense https://scigraph.springernature.com/explorer/license/
31 schema:sdPublisher N6a5fcb2dc4154462bd30c2534a0ddc41
32 schema:url https://link.springer.com/10.1007%2F978-981-13-2203-7_17
33 sgo:license sg:explorer/license/
34 sgo:sdDataset chapters
35 rdf:type schema:Chapter
36 N0af569b2f66d455e860d3663899c5530 schema:name dimensions_id
37 schema:value pub.1106916269
38 rdf:type schema:PropertyValue
39 N10e16ff80a094bb8868f98a6196e44fe rdf:first N3ffc12a1704c48b28bf2b9fe83b8df44
40 rdf:rest N551756d42f964ef699519b4558b45356
41 N296d2c6a14fa46babeec6f71eac60761 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
42 schema:familyName Li
43 schema:givenName Baogen
44 rdf:type schema:Person
45 N31b110a4098f4bd1b1de7e0ce2529007 rdf:first N7baaba830863433aa16dbaa65ad03e53
46 rdf:rest N10e16ff80a094bb8868f98a6196e44fe
47 N3ffc12a1704c48b28bf2b9fe83b8df44 schema:familyName Wang
48 schema:givenName Yan
49 rdf:type schema:Person
50 N50adbf575c1b497a90d33640d24b797f rdf:first Nf0d7a54cba7841dc910a0c092f094a59
51 rdf:rest N31b110a4098f4bd1b1de7e0ce2529007
52 N54b7b03025a94698bfb66ba2834b7412 schema:familyName Gan
53 schema:givenName Yong
54 rdf:type schema:Person
55 N551756d42f964ef699519b4558b45356 rdf:first Nc66cadae1ee142d6b1a994752d013c45
56 rdf:rest rdf:nil
57 N5ad2b4009e2f4e0bab2f778ebf703135 rdf:first sg:person.010531241272.46
58 rdf:rest rdf:nil
59 N6a5fcb2dc4154462bd30c2534a0ddc41 schema:name Springer Nature - SN SciGraph project
60 rdf:type schema:Organization
61 N74013e55c68346aeaa230664979bcfc0 rdf:first N296d2c6a14fa46babeec6f71eac60761
62 rdf:rest Naa3ff288d7fc4fcf9f28252629e45113
63 N7baaba830863433aa16dbaa65ad03e53 schema:familyName Song
64 schema:givenName Xianhua
65 rdf:type schema:Person
66 N86e361d2f1764301844dd0fdc5ac7133 schema:name doi
67 schema:value 10.1007/978-981-13-2203-7_17
68 rdf:type schema:PropertyValue
69 N8b83bfc226c2467a87878d224e09d97c schema:affiliation https://www.grid.ac/institutes/grid.449457.f
70 schema:familyName Zhang
71 schema:givenName Zheng
72 rdf:type schema:Person
73 N97592581ddfe4509a13740cc66e38de0 schema:familyName Zhou
74 schema:givenName Qinglei
75 rdf:type schema:Person
76 Naa3ff288d7fc4fcf9f28252629e45113 rdf:first N8b83bfc226c2467a87878d224e09d97c
77 rdf:rest N5ad2b4009e2f4e0bab2f778ebf703135
78 Nb9b8cec89d2b4d59bd34f01d680c8324 rdf:first sg:person.07565565401.60
79 rdf:rest Nc8e185b2c430408e920df6ad0bd5ea4f
80 Nbf95be69c47d4f78b00e2af97a2f9ba2 rdf:first N97592581ddfe4509a13740cc66e38de0
81 rdf:rest Nc9ddb31eab6d413b8ec640113b79c812
82 Nc66cadae1ee142d6b1a994752d013c45 schema:familyName Lu
83 schema:givenName Zeguang
84 rdf:type schema:Person
85 Nc8e185b2c430408e920df6ad0bd5ea4f rdf:first sg:person.013576203375.62
86 rdf:rest N74013e55c68346aeaa230664979bcfc0
87 Nc9ddb31eab6d413b8ec640113b79c812 rdf:first N54b7b03025a94698bfb66ba2834b7412
88 rdf:rest N50adbf575c1b497a90d33640d24b797f
89 Nc9f5229bdb4f4ff28296fad9f9e71c22 schema:location Singapore
90 schema:name Springer Singapore
91 rdf:type schema:Organisation
92 Ndd67ec7f0e854efd9f815a448126dad0 schema:isbn 978-981-13-2202-0
93 978-981-13-2203-7
94 schema:name Data Science
95 rdf:type schema:Book
96 Nf0d7a54cba7841dc910a0c092f094a59 schema:familyName Jing
97 schema:givenName Weipeng
98 rdf:type schema:Person
99 Nfcb7df4d718b4864942e88ea73c328c1 schema:name readcube_id
100 schema:value a63807298cce667fc3371365bc437f6a43ef175eb6f0250cb44b755ae034e032
101 rdf:type schema:PropertyValue
102 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
103 schema:name Information and Computing Sciences
104 rdf:type schema:DefinedTerm
105 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
106 schema:name Artificial Intelligence and Image Processing
107 rdf:type schema:DefinedTerm
108 sg:person.010531241272.46 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
109 schema:familyName Zhang
110 schema:givenName Wenqiang
111 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531241272.46
112 rdf:type schema:Person
113 sg:person.013576203375.62 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
114 schema:familyName Lu
115 schema:givenName Hong
116 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576203375.62
117 rdf:type schema:Person
118 sg:person.07565565401.60 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
119 schema:familyName Yang
120 schema:givenName Bohong
121 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07565565401.60
122 rdf:type schema:Person
123 sg:pub.10.1007/bf00992698 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033088958
124 https://doi.org/10.1007/bf00992698
125 rdf:type schema:CreativeWork
126 sg:pub.10.1038/nature14236 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030517994
127 https://doi.org/10.1038/nature14236
128 rdf:type schema:CreativeWork
129 sg:pub.10.1038/nature16961 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039427823
130 https://doi.org/10.1038/nature16961
131 rdf:type schema:CreativeWork
132 https://doi.org/10.1109/tnnls.2014.2327636 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718600
133 rdf:type schema:CreativeWork
134 https://doi.org/10.1109/tnnls.2014.2371046 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718712
135 rdf:type schema:CreativeWork
136 https://doi.org/10.1109/tnnls.2014.2376703 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718718
137 rdf:type schema:CreativeWork
138 https://doi.org/10.1109/tnnls.2015.2403394 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718799
139 rdf:type schema:CreativeWork
140 https://doi.org/10.1109/tnnls.2016.2522401 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061719118
141 rdf:type schema:CreativeWork
142 https://doi.org/10.1145/2647868.2654889 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052031051
143 rdf:type schema:CreativeWork
144 https://www.grid.ac/institutes/grid.449457.f schema:alternateName New York University Shanghai
145 schema:name School of Computer Science, New York University Shanghai, Shanghai, People’s Republic of China
146 rdf:type schema:Organization
147 https://www.grid.ac/institutes/grid.8547.e schema:alternateName Fudan University
148 schema:name Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People’s Republic of China
149 Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People’s Republic of China
150 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...