A Novel Experience-Based Exploration Method for Q-Learning View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2018-09-09

AUTHORS

Bohong Yang , Hong Lu , Baogen Li , Zheng Zhang , Wenqiang Zhang

ABSTRACT

Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method—Experience-based Exploration method—in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld. More... »

PAGES

225-240

Book

TITLE

Data Science

ISBN

978-981-13-2202-0
978-981-13-2203-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17

DOI

http://dx.doi.org/10.1007/978-981-13-2203-7_17

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1106916269


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yang", 
        "givenName": "Bohong", 
        "id": "sg:person.07565565401.60", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07565565401.60"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lu", 
        "givenName": "Hong", 
        "id": "sg:person.013576203375.62", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576203375.62"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Li", 
        "givenName": "Baogen", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "New York University Shanghai", 
          "id": "https://www.grid.ac/institutes/grid.449457.f", 
          "name": [
            "School of Computer Science, New York University Shanghai, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zhang", 
        "givenName": "Zheng", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Fudan University", 
          "id": "https://www.grid.ac/institutes/grid.8547.e", 
          "name": [
            "Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zhang", 
        "givenName": "Wenqiang", 
        "id": "sg:person.010531241272.46", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531241272.46"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nature14236", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030517994", 
          "https://doi.org/10.1038/nature14236"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00992698", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033088958", 
          "https://doi.org/10.1007/bf00992698"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature16961", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039427823", 
          "https://doi.org/10.1038/nature16961"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2647868.2654889", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052031051"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2327636", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718600"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2371046", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718712"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2014.2376703", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718718"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2015.2403394", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061718799"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tnnls.2016.2522401", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061719118"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-09-09", 
    "datePublishedReg": "2018-09-09", 
    "description": "Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method\u2014Experience-based Exploration method\u2014in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld.", 
    "editor": [
      {
        "familyName": "Zhou", 
        "givenName": "Qinglei", 
        "type": "Person"
      }, 
      {
        "familyName": "Gan", 
        "givenName": "Yong", 
        "type": "Person"
      }, 
      {
        "familyName": "Jing", 
        "givenName": "Weipeng", 
        "type": "Person"
      }, 
      {
        "familyName": "Song", 
        "givenName": "Xianhua", 
        "type": "Person"
      }, 
      {
        "familyName": "Wang", 
        "givenName": "Yan", 
        "type": "Person"
      }, 
      {
        "familyName": "Lu", 
        "givenName": "Zeguang", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-981-13-2203-7_17", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-981-13-2202-0", 
        "978-981-13-2203-7"
      ], 
      "name": "Data Science", 
      "type": "Book"
    }, 
    "name": "A Novel Experience-Based Exploration Method for Q-Learning", 
    "pagination": "225-240", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-981-13-2203-7_17"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "a63807298cce667fc3371365bc437f6a43ef175eb6f0250cb44b755ae034e032"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1106916269"
        ]
      }
    ], 
    "publisher": {
      "location": "Singapore", 
      "name": "Springer Singapore", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-981-13-2203-7_17", 
      "https://app.dimensions.ai/details/publication/pub.1106916269"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T04:41", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000322_0000000322/records_65017_00000000.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-981-13-2203-7_17"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'


 

This table displays all metadata directly associated to this object as RDF triples.

150 TRIPLES      23 PREDICATES      35 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-981-13-2203-7_17 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N35a15a0038fd4cde938768f0ab13adec
4 schema:citation sg:pub.10.1007/bf00992698
5 sg:pub.10.1038/nature14236
6 sg:pub.10.1038/nature16961
7 https://doi.org/10.1109/tnnls.2014.2327636
8 https://doi.org/10.1109/tnnls.2014.2371046
9 https://doi.org/10.1109/tnnls.2014.2376703
10 https://doi.org/10.1109/tnnls.2015.2403394
11 https://doi.org/10.1109/tnnls.2016.2522401
12 https://doi.org/10.1145/2647868.2654889
13 schema:datePublished 2018-09-09
14 schema:datePublishedReg 2018-09-09
15 schema:description Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method—Experience-based Exploration method—in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld.
16 schema:editor N94812e1602d04b208081be8b7b3573c9
17 schema:genre chapter
18 schema:inLanguage en
19 schema:isAccessibleForFree false
20 schema:isPartOf N0a02e8b6dc5f496f9da98a2c0dbee4e3
21 schema:name A Novel Experience-Based Exploration Method for Q-Learning
22 schema:pagination 225-240
23 schema:productId N029d1b7c06a94d8e85bb0d5f49ad68be
24 Ndab0ce60b33848579eefb01aa5c441c5
25 Ndb43fb8cd2cc4268a2fbc98b2dedc34a
26 schema:publisher N58d359c82fc74aeebfd6db6e352736c6
27 schema:sameAs https://app.dimensions.ai/details/publication/pub.1106916269
28 https://doi.org/10.1007/978-981-13-2203-7_17
29 schema:sdDatePublished 2019-04-16T04:41
30 schema:sdLicense https://scigraph.springernature.com/explorer/license/
31 schema:sdPublisher N6982dee65cb94c7b82022cbb53652f82
32 schema:url https://link.springer.com/10.1007%2F978-981-13-2203-7_17
33 sgo:license sg:explorer/license/
34 sgo:sdDataset chapters
35 rdf:type schema:Chapter
36 N029d1b7c06a94d8e85bb0d5f49ad68be schema:name readcube_id
37 schema:value a63807298cce667fc3371365bc437f6a43ef175eb6f0250cb44b755ae034e032
38 rdf:type schema:PropertyValue
39 N0a02e8b6dc5f496f9da98a2c0dbee4e3 schema:isbn 978-981-13-2202-0
40 978-981-13-2203-7
41 schema:name Data Science
42 rdf:type schema:Book
43 N0f2bceb5ff254506a44494bcaaf1961d rdf:first sg:person.010531241272.46
44 rdf:rest rdf:nil
45 N10fc87b020494a3c8c607a082ef955a9 rdf:first N5b0add4f883840909268ca4b1ce4f7a5
46 rdf:rest N70f19905fe2b49d2aa56b0df676b76c2
47 N24c715748b884fb688d58d75b77012ab rdf:first N9e2f279dda7844f4bbe80955355ee94e
48 rdf:rest Nfc913b6532214df5ab6020c8367a5e3d
49 N34cfe915ee2848dba339e3743c663315 rdf:first sg:person.013576203375.62
50 rdf:rest N10fc87b020494a3c8c607a082ef955a9
51 N35a15a0038fd4cde938768f0ab13adec rdf:first sg:person.07565565401.60
52 rdf:rest N34cfe915ee2848dba339e3743c663315
53 N3ecfd842618c426296726fe8813c8bc7 rdf:first N870e60b6d1aa479bb5cc00920aec9c5b
54 rdf:rest rdf:nil
55 N56ba084096d14ae7807e7a066a9f7f18 schema:affiliation https://www.grid.ac/institutes/grid.449457.f
56 schema:familyName Zhang
57 schema:givenName Zheng
58 rdf:type schema:Person
59 N58d359c82fc74aeebfd6db6e352736c6 schema:location Singapore
60 schema:name Springer Singapore
61 rdf:type schema:Organisation
62 N5b0add4f883840909268ca4b1ce4f7a5 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
63 schema:familyName Li
64 schema:givenName Baogen
65 rdf:type schema:Person
66 N6982dee65cb94c7b82022cbb53652f82 schema:name Springer Nature - SN SciGraph project
67 rdf:type schema:Organization
68 N70f19905fe2b49d2aa56b0df676b76c2 rdf:first N56ba084096d14ae7807e7a066a9f7f18
69 rdf:rest N0f2bceb5ff254506a44494bcaaf1961d
70 N870e60b6d1aa479bb5cc00920aec9c5b schema:familyName Lu
71 schema:givenName Zeguang
72 rdf:type schema:Person
73 N90217337589a4582849a3c37321b5f14 rdf:first N90ed4ee91fc14f87b8645a0cc5965af5
74 rdf:rest Nbdd3151ab7364cef907ddf0a546c30c3
75 N90ed4ee91fc14f87b8645a0cc5965af5 schema:familyName Gan
76 schema:givenName Yong
77 rdf:type schema:Person
78 N94812e1602d04b208081be8b7b3573c9 rdf:first Nb0652e8eb3d5486197ac6926fc9b0094
79 rdf:rest N90217337589a4582849a3c37321b5f14
80 N9e2f279dda7844f4bbe80955355ee94e schema:familyName Song
81 schema:givenName Xianhua
82 rdf:type schema:Person
83 Na24e7310b0524445b0376c76b598dd6f schema:familyName Jing
84 schema:givenName Weipeng
85 rdf:type schema:Person
86 Na7f114fecbe94f4c9cba81c38e9e6d2a schema:familyName Wang
87 schema:givenName Yan
88 rdf:type schema:Person
89 Nb0652e8eb3d5486197ac6926fc9b0094 schema:familyName Zhou
90 schema:givenName Qinglei
91 rdf:type schema:Person
92 Nbdd3151ab7364cef907ddf0a546c30c3 rdf:first Na24e7310b0524445b0376c76b598dd6f
93 rdf:rest N24c715748b884fb688d58d75b77012ab
94 Ndab0ce60b33848579eefb01aa5c441c5 schema:name doi
95 schema:value 10.1007/978-981-13-2203-7_17
96 rdf:type schema:PropertyValue
97 Ndb43fb8cd2cc4268a2fbc98b2dedc34a schema:name dimensions_id
98 schema:value pub.1106916269
99 rdf:type schema:PropertyValue
100 Nfc913b6532214df5ab6020c8367a5e3d rdf:first Na7f114fecbe94f4c9cba81c38e9e6d2a
101 rdf:rest N3ecfd842618c426296726fe8813c8bc7
102 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
103 schema:name Information and Computing Sciences
104 rdf:type schema:DefinedTerm
105 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
106 schema:name Artificial Intelligence and Image Processing
107 rdf:type schema:DefinedTerm
108 sg:person.010531241272.46 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
109 schema:familyName Zhang
110 schema:givenName Wenqiang
111 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531241272.46
112 rdf:type schema:Person
113 sg:person.013576203375.62 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
114 schema:familyName Lu
115 schema:givenName Hong
116 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576203375.62
117 rdf:type schema:Person
118 sg:person.07565565401.60 schema:affiliation https://www.grid.ac/institutes/grid.8547.e
119 schema:familyName Yang
120 schema:givenName Bohong
121 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07565565401.60
122 rdf:type schema:Person
123 sg:pub.10.1007/bf00992698 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033088958
124 https://doi.org/10.1007/bf00992698
125 rdf:type schema:CreativeWork
126 sg:pub.10.1038/nature14236 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030517994
127 https://doi.org/10.1038/nature14236
128 rdf:type schema:CreativeWork
129 sg:pub.10.1038/nature16961 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039427823
130 https://doi.org/10.1038/nature16961
131 rdf:type schema:CreativeWork
132 https://doi.org/10.1109/tnnls.2014.2327636 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718600
133 rdf:type schema:CreativeWork
134 https://doi.org/10.1109/tnnls.2014.2371046 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718712
135 rdf:type schema:CreativeWork
136 https://doi.org/10.1109/tnnls.2014.2376703 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718718
137 rdf:type schema:CreativeWork
138 https://doi.org/10.1109/tnnls.2015.2403394 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061718799
139 rdf:type schema:CreativeWork
140 https://doi.org/10.1109/tnnls.2016.2522401 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061719118
141 rdf:type schema:CreativeWork
142 https://doi.org/10.1145/2647868.2654889 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052031051
143 rdf:type schema:CreativeWork
144 https://www.grid.ac/institutes/grid.449457.f schema:alternateName New York University Shanghai
145 schema:name School of Computer Science, New York University Shanghai, Shanghai, People’s Republic of China
146 rdf:type schema:Organization
147 https://www.grid.ac/institutes/grid.8547.e schema:alternateName Fudan University
148 schema:name Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People’s Republic of China
149 Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People’s Republic of China
150 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...