Bandit Based Monte-Carlo Planning View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2006

AUTHORS

Levente Kocsis , Csaba Szepesvári

ABSTRACT

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. More... »

PAGES

282-293

References to SciGraph publications

  • 2002-05. Finite-time Analysis of the Multiarmed Bandit Problem in MACHINE LEARNING
  • 2004. Monte-Carlo Go Developments in ADVANCES IN COMPUTER GAMES
  • Book

    TITLE

    Machine Learning: ECML 2006

    ISBN

    978-3-540-45375-8
    978-3-540-46056-5

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/11871842_29

    DOI

    http://dx.doi.org/10.1007/11871842_29

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1037233025


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0102", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Applied Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "MTA Institute for Computer Science and Control", 
              "id": "https://www.grid.ac/institutes/grid.4836.9", 
              "name": [
                "Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kocsis", 
            "givenName": "Levente", 
            "id": "sg:person.07770302215.82", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07770302215.82"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "MTA Institute for Computer Science and Control", 
              "id": "https://www.grid.ac/institutes/grid.4836.9", 
              "name": [
                "Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Szepesv\u00e1ri", 
            "givenName": "Csaba", 
            "id": "sg:person.016202177221.23", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1016/s0004-3702(01)00166-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010924070"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0196-8858(85)90002-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033673111"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1013689704352", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039349898", 
              "https://doi.org/10.1023/a:1013689704352"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-35706-5_11", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044078476", 
              "https://doi.org/10.1007/978-0-387-35706-5_11"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/s0004-3702(01)00130-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044471374"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1137/s0097539701398375", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1062879339"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1287/opre.1040.0145", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1064725620"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2006", 
        "datePublishedReg": "2006-01-01", 
        "description": "For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.", 
        "editor": [
          {
            "familyName": "F\u00fcrnkranz", 
            "givenName": "Johannes", 
            "type": "Person"
          }, 
          {
            "familyName": "Scheffer", 
            "givenName": "Tobias", 
            "type": "Person"
          }, 
          {
            "familyName": "Spiliopoulou", 
            "givenName": "Myra", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/11871842_29", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-540-45375-8", 
            "978-3-540-46056-5"
          ], 
          "name": "Machine Learning: ECML 2006", 
          "type": "Book"
        }, 
        "name": "Bandit Based Monte-Carlo Planning", 
        "pagination": "282-293", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1037233025"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/11871842_29"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "b15d6ca448e1b85b9b4203039d8fb0f7baba57849020f5d148efc72945f096d1"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/11871842_29", 
          "https://app.dimensions.ai/details/publication/pub.1037233025"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T07:26", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000355_0000000355/records_52991_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F11871842_29"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'


     

    This table displays all metadata directly associated to this object as RDF triples.

    105 TRIPLES      23 PREDICATES      34 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/11871842_29 schema:about anzsrc-for:01
    2 anzsrc-for:0102
    3 schema:author N305ea3b5482246098fe6c6ff473da519
    4 schema:citation sg:pub.10.1007/978-0-387-35706-5_11
    5 sg:pub.10.1023/a:1013689704352
    6 https://doi.org/10.1016/0196-8858(85)90002-8
    7 https://doi.org/10.1016/s0004-3702(01)00130-8
    8 https://doi.org/10.1016/s0004-3702(01)00166-7
    9 https://doi.org/10.1137/s0097539701398375
    10 https://doi.org/10.1287/opre.1040.0145
    11 schema:datePublished 2006
    12 schema:datePublishedReg 2006-01-01
    13 schema:description For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.
    14 schema:editor N5750946925bc4220892db2865f6a8ede
    15 schema:genre chapter
    16 schema:inLanguage en
    17 schema:isAccessibleForFree true
    18 schema:isPartOf N227a60c01e364ba0a0b6ebe5a1e079d8
    19 schema:name Bandit Based Monte-Carlo Planning
    20 schema:pagination 282-293
    21 schema:productId N11656be06a874bdcbb12a0e1a1ac8052
    22 Na06987a0517c4a10b220e151991ec29c
    23 Ncafd71fc3be24fe7af25ad53ab492401
    24 schema:publisher Nbad3523a044248939bcac19297bbe156
    25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037233025
    26 https://doi.org/10.1007/11871842_29
    27 schema:sdDatePublished 2019-04-16T07:26
    28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    29 schema:sdPublisher N27c310b17c3e45e798add3bd57a075de
    30 schema:url https://link.springer.com/10.1007%2F11871842_29
    31 sgo:license sg:explorer/license/
    32 sgo:sdDataset chapters
    33 rdf:type schema:Chapter
    34 N02b3c1f8617c418fbaf7f294b8071b7e schema:familyName Fürnkranz
    35 schema:givenName Johannes
    36 rdf:type schema:Person
    37 N0c2831b88cc94af1a93f68d9bfa598d0 rdf:first N2fb796866fa946d38f8ba798a7380b00
    38 rdf:rest N10a73f960ee245d68ca29660efd5b407
    39 N0dbf03988d834fda95d7e26a2b2c6dbd schema:familyName Spiliopoulou
    40 schema:givenName Myra
    41 rdf:type schema:Person
    42 N10a73f960ee245d68ca29660efd5b407 rdf:first N0dbf03988d834fda95d7e26a2b2c6dbd
    43 rdf:rest rdf:nil
    44 N11656be06a874bdcbb12a0e1a1ac8052 schema:name dimensions_id
    45 schema:value pub.1037233025
    46 rdf:type schema:PropertyValue
    47 N227a60c01e364ba0a0b6ebe5a1e079d8 schema:isbn 978-3-540-45375-8
    48 978-3-540-46056-5
    49 schema:name Machine Learning: ECML 2006
    50 rdf:type schema:Book
    51 N27c310b17c3e45e798add3bd57a075de schema:name Springer Nature - SN SciGraph project
    52 rdf:type schema:Organization
    53 N2fb796866fa946d38f8ba798a7380b00 schema:familyName Scheffer
    54 schema:givenName Tobias
    55 rdf:type schema:Person
    56 N305ea3b5482246098fe6c6ff473da519 rdf:first sg:person.07770302215.82
    57 rdf:rest N97b1b827e5704a13b91fc1c7641d3deb
    58 N5750946925bc4220892db2865f6a8ede rdf:first N02b3c1f8617c418fbaf7f294b8071b7e
    59 rdf:rest N0c2831b88cc94af1a93f68d9bfa598d0
    60 N97b1b827e5704a13b91fc1c7641d3deb rdf:first sg:person.016202177221.23
    61 rdf:rest rdf:nil
    62 Na06987a0517c4a10b220e151991ec29c schema:name doi
    63 schema:value 10.1007/11871842_29
    64 rdf:type schema:PropertyValue
    65 Nbad3523a044248939bcac19297bbe156 schema:location Berlin, Heidelberg
    66 schema:name Springer Berlin Heidelberg
    67 rdf:type schema:Organisation
    68 Ncafd71fc3be24fe7af25ad53ab492401 schema:name readcube_id
    69 schema:value b15d6ca448e1b85b9b4203039d8fb0f7baba57849020f5d148efc72945f096d1
    70 rdf:type schema:PropertyValue
    71 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    72 schema:name Mathematical Sciences
    73 rdf:type schema:DefinedTerm
    74 anzsrc-for:0102 schema:inDefinedTermSet anzsrc-for:
    75 schema:name Applied Mathematics
    76 rdf:type schema:DefinedTerm
    77 sg:person.016202177221.23 schema:affiliation https://www.grid.ac/institutes/grid.4836.9
    78 schema:familyName Szepesvári
    79 schema:givenName Csaba
    80 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23
    81 rdf:type schema:Person
    82 sg:person.07770302215.82 schema:affiliation https://www.grid.ac/institutes/grid.4836.9
    83 schema:familyName Kocsis
    84 schema:givenName Levente
    85 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07770302215.82
    86 rdf:type schema:Person
    87 sg:pub.10.1007/978-0-387-35706-5_11 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044078476
    88 https://doi.org/10.1007/978-0-387-35706-5_11
    89 rdf:type schema:CreativeWork
    90 sg:pub.10.1023/a:1013689704352 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039349898
    91 https://doi.org/10.1023/a:1013689704352
    92 rdf:type schema:CreativeWork
    93 https://doi.org/10.1016/0196-8858(85)90002-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033673111
    94 rdf:type schema:CreativeWork
    95 https://doi.org/10.1016/s0004-3702(01)00130-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044471374
    96 rdf:type schema:CreativeWork
    97 https://doi.org/10.1016/s0004-3702(01)00166-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010924070
    98 rdf:type schema:CreativeWork
    99 https://doi.org/10.1137/s0097539701398375 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062879339
    100 rdf:type schema:CreativeWork
    101 https://doi.org/10.1287/opre.1040.0145 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064725620
    102 rdf:type schema:CreativeWork
    103 https://www.grid.ac/institutes/grid.4836.9 schema:alternateName MTA Institute for Computer Science and Control
    104 schema:name Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary
    105 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...