Bandit Based Monte-Carlo Planning View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2006

AUTHORS

Levente Kocsis , Csaba Szepesvári

ABSTRACT

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. More... »

PAGES

282-293

References to SciGraph publications

  • 2002-05. Finite-time Analysis of the Multiarmed Bandit Problem in MACHINE LEARNING
  • 2004. Monte-Carlo Go Developments in ADVANCES IN COMPUTER GAMES
  • Book

    TITLE

    Machine Learning: ECML 2006

    ISBN

    978-3-540-45375-8
    978-3-540-46056-5

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/11871842_29

    DOI

    http://dx.doi.org/10.1007/11871842_29

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1037233025


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0102", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Applied Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "MTA Institute for Computer Science and Control", 
              "id": "https://www.grid.ac/institutes/grid.4836.9", 
              "name": [
                "Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kocsis", 
            "givenName": "Levente", 
            "id": "sg:person.07770302215.82", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07770302215.82"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "MTA Institute for Computer Science and Control", 
              "id": "https://www.grid.ac/institutes/grid.4836.9", 
              "name": [
                "Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Szepesv\u00e1ri", 
            "givenName": "Csaba", 
            "id": "sg:person.016202177221.23", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1016/s0004-3702(01)00166-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010924070"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0196-8858(85)90002-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033673111"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1013689704352", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039349898", 
              "https://doi.org/10.1023/a:1013689704352"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-35706-5_11", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044078476", 
              "https://doi.org/10.1007/978-0-387-35706-5_11"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/s0004-3702(01)00130-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044471374"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1137/s0097539701398375", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1062879339"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1287/opre.1040.0145", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1064725620"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2006", 
        "datePublishedReg": "2006-01-01", 
        "description": "For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.", 
        "editor": [
          {
            "familyName": "F\u00fcrnkranz", 
            "givenName": "Johannes", 
            "type": "Person"
          }, 
          {
            "familyName": "Scheffer", 
            "givenName": "Tobias", 
            "type": "Person"
          }, 
          {
            "familyName": "Spiliopoulou", 
            "givenName": "Myra", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/11871842_29", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-540-45375-8", 
            "978-3-540-46056-5"
          ], 
          "name": "Machine Learning: ECML 2006", 
          "type": "Book"
        }, 
        "name": "Bandit Based Monte-Carlo Planning", 
        "pagination": "282-293", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1037233025"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/11871842_29"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "b15d6ca448e1b85b9b4203039d8fb0f7baba57849020f5d148efc72945f096d1"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/11871842_29", 
          "https://app.dimensions.ai/details/publication/pub.1037233025"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T07:26", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000355_0000000355/records_52991_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F11871842_29"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'


     

    This table displays all metadata directly associated to this object as RDF triples.

    105 TRIPLES      23 PREDICATES      34 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/11871842_29 schema:about anzsrc-for:01
    2 anzsrc-for:0102
    3 schema:author N04d4ceeba6a448fa992340fe7085843e
    4 schema:citation sg:pub.10.1007/978-0-387-35706-5_11
    5 sg:pub.10.1023/a:1013689704352
    6 https://doi.org/10.1016/0196-8858(85)90002-8
    7 https://doi.org/10.1016/s0004-3702(01)00130-8
    8 https://doi.org/10.1016/s0004-3702(01)00166-7
    9 https://doi.org/10.1137/s0097539701398375
    10 https://doi.org/10.1287/opre.1040.0145
    11 schema:datePublished 2006
    12 schema:datePublishedReg 2006-01-01
    13 schema:description For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.
    14 schema:editor Na4f746b305c74c3488b9788a65c4258d
    15 schema:genre chapter
    16 schema:inLanguage en
    17 schema:isAccessibleForFree true
    18 schema:isPartOf Ndee196c6d2a64057b7c12ff576526ca4
    19 schema:name Bandit Based Monte-Carlo Planning
    20 schema:pagination 282-293
    21 schema:productId N5dbdf1e1ce944da88ecc881d3cf44e41
    22 N6f4e667793584828b41e858dafa1b5df
    23 Nb0833e0dca46459e804468ee4dd4b951
    24 schema:publisher N4d4f8fd5a5fe4425b40fae9c2b9a0e8e
    25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037233025
    26 https://doi.org/10.1007/11871842_29
    27 schema:sdDatePublished 2019-04-16T07:26
    28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    29 schema:sdPublisher N2c5e4f4d34c44ed89d3d84b8b4e268bc
    30 schema:url https://link.springer.com/10.1007%2F11871842_29
    31 sgo:license sg:explorer/license/
    32 sgo:sdDataset chapters
    33 rdf:type schema:Chapter
    34 N04d4ceeba6a448fa992340fe7085843e rdf:first sg:person.07770302215.82
    35 rdf:rest N796d0cf1b55444b3b3ed390afc388af0
    36 N1db6e3d3fe1c40fba238cfd3ca88dc45 rdf:first Nde07b2256fa04da4b6d9f630d700387f
    37 rdf:rest N54533ddbe44d423c95c94814c9c27fa1
    38 N2c5e4f4d34c44ed89d3d84b8b4e268bc schema:name Springer Nature - SN SciGraph project
    39 rdf:type schema:Organization
    40 N4d4f8fd5a5fe4425b40fae9c2b9a0e8e schema:location Berlin, Heidelberg
    41 schema:name Springer Berlin Heidelberg
    42 rdf:type schema:Organisation
    43 N54533ddbe44d423c95c94814c9c27fa1 rdf:first Nf3127ca832a5454d9056760bbb525be1
    44 rdf:rest rdf:nil
    45 N5aefcccb061d437788334a39e5e082f7 schema:familyName Fürnkranz
    46 schema:givenName Johannes
    47 rdf:type schema:Person
    48 N5dbdf1e1ce944da88ecc881d3cf44e41 schema:name readcube_id
    49 schema:value b15d6ca448e1b85b9b4203039d8fb0f7baba57849020f5d148efc72945f096d1
    50 rdf:type schema:PropertyValue
    51 N6f4e667793584828b41e858dafa1b5df schema:name dimensions_id
    52 schema:value pub.1037233025
    53 rdf:type schema:PropertyValue
    54 N796d0cf1b55444b3b3ed390afc388af0 rdf:first sg:person.016202177221.23
    55 rdf:rest rdf:nil
    56 Na4f746b305c74c3488b9788a65c4258d rdf:first N5aefcccb061d437788334a39e5e082f7
    57 rdf:rest N1db6e3d3fe1c40fba238cfd3ca88dc45
    58 Nb0833e0dca46459e804468ee4dd4b951 schema:name doi
    59 schema:value 10.1007/11871842_29
    60 rdf:type schema:PropertyValue
    61 Nde07b2256fa04da4b6d9f630d700387f schema:familyName Scheffer
    62 schema:givenName Tobias
    63 rdf:type schema:Person
    64 Ndee196c6d2a64057b7c12ff576526ca4 schema:isbn 978-3-540-45375-8
    65 978-3-540-46056-5
    66 schema:name Machine Learning: ECML 2006
    67 rdf:type schema:Book
    68 Nf3127ca832a5454d9056760bbb525be1 schema:familyName Spiliopoulou
    69 schema:givenName Myra
    70 rdf:type schema:Person
    71 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    72 schema:name Mathematical Sciences
    73 rdf:type schema:DefinedTerm
    74 anzsrc-for:0102 schema:inDefinedTermSet anzsrc-for:
    75 schema:name Applied Mathematics
    76 rdf:type schema:DefinedTerm
    77 sg:person.016202177221.23 schema:affiliation https://www.grid.ac/institutes/grid.4836.9
    78 schema:familyName Szepesvári
    79 schema:givenName Csaba
    80 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23
    81 rdf:type schema:Person
    82 sg:person.07770302215.82 schema:affiliation https://www.grid.ac/institutes/grid.4836.9
    83 schema:familyName Kocsis
    84 schema:givenName Levente
    85 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07770302215.82
    86 rdf:type schema:Person
    87 sg:pub.10.1007/978-0-387-35706-5_11 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044078476
    88 https://doi.org/10.1007/978-0-387-35706-5_11
    89 rdf:type schema:CreativeWork
    90 sg:pub.10.1023/a:1013689704352 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039349898
    91 https://doi.org/10.1023/a:1013689704352
    92 rdf:type schema:CreativeWork
    93 https://doi.org/10.1016/0196-8858(85)90002-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033673111
    94 rdf:type schema:CreativeWork
    95 https://doi.org/10.1016/s0004-3702(01)00130-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044471374
    96 rdf:type schema:CreativeWork
    97 https://doi.org/10.1016/s0004-3702(01)00166-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010924070
    98 rdf:type schema:CreativeWork
    99 https://doi.org/10.1137/s0097539701398375 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062879339
    100 rdf:type schema:CreativeWork
    101 https://doi.org/10.1287/opre.1040.0145 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064725620
    102 rdf:type schema:CreativeWork
    103 https://www.grid.ac/institutes/grid.4836.9 schema:alternateName MTA Institute for Computer Science and Control
    104 schema:name Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary
    105 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...