Ontology type: schema:Chapter
2018-09-09
AUTHORSBohong Yang , Hong Lu , Baogen Li , Zheng Zhang , Wenqiang Zhang
ABSTRACTReinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method—Experience-based Exploration method—in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld. More... »
PAGES225-240
Data Science
ISBN
978-981-13-2202-0
978-981-13-2203-7
http://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17
DOIhttp://dx.doi.org/10.1007/978-981-13-2203-7_17
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1106916269
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Artificial Intelligence and Image Processing",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Fudan University",
"id": "https://www.grid.ac/institutes/grid.8547.e",
"name": [
"Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
],
"type": "Organization"
},
"familyName": "Yang",
"givenName": "Bohong",
"id": "sg:person.07565565401.60",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07565565401.60"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Fudan University",
"id": "https://www.grid.ac/institutes/grid.8547.e",
"name": [
"Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
],
"type": "Organization"
},
"familyName": "Lu",
"givenName": "Hong",
"id": "sg:person.013576203375.62",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576203375.62"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Fudan University",
"id": "https://www.grid.ac/institutes/grid.8547.e",
"name": [
"Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
],
"type": "Organization"
},
"familyName": "Li",
"givenName": "Baogen",
"type": "Person"
},
{
"affiliation": {
"alternateName": "New York University Shanghai",
"id": "https://www.grid.ac/institutes/grid.449457.f",
"name": [
"School of Computer Science, New York University Shanghai, Shanghai, People\u2019s Republic of China"
],
"type": "Organization"
},
"familyName": "Zhang",
"givenName": "Zheng",
"type": "Person"
},
{
"affiliation": {
"alternateName": "Fudan University",
"id": "https://www.grid.ac/institutes/grid.8547.e",
"name": [
"Shanghai Engineering Research Center for Video Technology and System, School of Computer Science, Fudan University, Shanghai, People\u2019s Republic of China"
],
"type": "Organization"
},
"familyName": "Zhang",
"givenName": "Wenqiang",
"id": "sg:person.010531241272.46",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010531241272.46"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1038/nature14236",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1030517994",
"https://doi.org/10.1038/nature14236"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/bf00992698",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1033088958",
"https://doi.org/10.1007/bf00992698"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/nature16961",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1039427823",
"https://doi.org/10.1038/nature16961"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/2647868.2654889",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1052031051"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/tnnls.2014.2327636",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061718600"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/tnnls.2014.2371046",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061718712"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/tnnls.2014.2376703",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061718718"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/tnnls.2015.2403394",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061718799"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/tnnls.2016.2522401",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061719118"
],
"type": "CreativeWork"
}
],
"datePublished": "2018-09-09",
"datePublishedReg": "2018-09-09",
"description": "Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method\u2014Experience-based Exploration method\u2014in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld.",
"editor": [
{
"familyName": "Zhou",
"givenName": "Qinglei",
"type": "Person"
},
{
"familyName": "Gan",
"givenName": "Yong",
"type": "Person"
},
{
"familyName": "Jing",
"givenName": "Weipeng",
"type": "Person"
},
{
"familyName": "Song",
"givenName": "Xianhua",
"type": "Person"
},
{
"familyName": "Wang",
"givenName": "Yan",
"type": "Person"
},
{
"familyName": "Lu",
"givenName": "Zeguang",
"type": "Person"
}
],
"genre": "chapter",
"id": "sg:pub.10.1007/978-981-13-2203-7_17",
"inLanguage": [
"en"
],
"isAccessibleForFree": false,
"isPartOf": {
"isbn": [
"978-981-13-2202-0",
"978-981-13-2203-7"
],
"name": "Data Science",
"type": "Book"
},
"name": "A Novel Experience-Based Exploration Method for Q-Learning",
"pagination": "225-240",
"productId": [
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/978-981-13-2203-7_17"
]
},
{
"name": "readcube_id",
"type": "PropertyValue",
"value": [
"a63807298cce667fc3371365bc437f6a43ef175eb6f0250cb44b755ae034e032"
]
},
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1106916269"
]
}
],
"publisher": {
"location": "Singapore",
"name": "Springer Singapore",
"type": "Organisation"
},
"sameAs": [
"https://doi.org/10.1007/978-981-13-2203-7_17",
"https://app.dimensions.ai/details/publication/pub.1106916269"
],
"sdDataset": "chapters",
"sdDatePublished": "2019-04-16T04:41",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000322_0000000322/records_65017_00000000.jsonl",
"type": "Chapter",
"url": "https://link.springer.com/10.1007%2F978-981-13-2203-7_17"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-981-13-2203-7_17'
This table displays all metadata directly associated to this object as RDF triples.
150 TRIPLES
23 PREDICATES
35 URIs
19 LITERALS
8 BLANK NODES