Bandit Based Monte-Carlo Planning View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2006

AUTHORS

Levente Kocsis , Csaba Szepesvári

ABSTRACT

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. More... »

PAGES

282-293

References to SciGraph publications

  • 2002-05. Finite-time Analysis of the Multiarmed Bandit Problem in MACHINE LEARNING
  • 2004. Monte-Carlo Go Developments in ADVANCES IN COMPUTER GAMES
  • Book

    TITLE

    Machine Learning: ECML 2006

    ISBN

    978-3-540-45375-8
    978-3-540-46056-5

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/11871842_29

    DOI

    http://dx.doi.org/10.1007/11871842_29

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1037233025


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0102", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Applied Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "MTA Institute for Computer Science and Control", 
              "id": "https://www.grid.ac/institutes/grid.4836.9", 
              "name": [
                "Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kocsis", 
            "givenName": "Levente", 
            "id": "sg:person.07770302215.82", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07770302215.82"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "MTA Institute for Computer Science and Control", 
              "id": "https://www.grid.ac/institutes/grid.4836.9", 
              "name": [
                "Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Szepesv\u00e1ri", 
            "givenName": "Csaba", 
            "id": "sg:person.016202177221.23", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1016/s0004-3702(01)00166-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010924070"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0196-8858(85)90002-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033673111"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1013689704352", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039349898", 
              "https://doi.org/10.1023/a:1013689704352"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-35706-5_11", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044078476", 
              "https://doi.org/10.1007/978-0-387-35706-5_11"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/s0004-3702(01)00130-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044471374"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1137/s0097539701398375", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1062879339"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1287/opre.1040.0145", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1064725620"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2006", 
        "datePublishedReg": "2006-01-01", 
        "description": "For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.", 
        "editor": [
          {
            "familyName": "F\u00fcrnkranz", 
            "givenName": "Johannes", 
            "type": "Person"
          }, 
          {
            "familyName": "Scheffer", 
            "givenName": "Tobias", 
            "type": "Person"
          }, 
          {
            "familyName": "Spiliopoulou", 
            "givenName": "Myra", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/11871842_29", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-540-45375-8", 
            "978-3-540-46056-5"
          ], 
          "name": "Machine Learning: ECML 2006", 
          "type": "Book"
        }, 
        "name": "Bandit Based Monte-Carlo Planning", 
        "pagination": "282-293", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1037233025"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/11871842_29"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "b15d6ca448e1b85b9b4203039d8fb0f7baba57849020f5d148efc72945f096d1"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/11871842_29", 
          "https://app.dimensions.ai/details/publication/pub.1037233025"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T07:26", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000355_0000000355/records_52991_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F11871842_29"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/11871842_29'


     

    This table displays all metadata directly associated to this object as RDF triples.

    105 TRIPLES      23 PREDICATES      34 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/11871842_29 schema:about anzsrc-for:01
    2 anzsrc-for:0102
    3 schema:author N469e75d7ef82492089eef06908192d5f
    4 schema:citation sg:pub.10.1007/978-0-387-35706-5_11
    5 sg:pub.10.1023/a:1013689704352
    6 https://doi.org/10.1016/0196-8858(85)90002-8
    7 https://doi.org/10.1016/s0004-3702(01)00130-8
    8 https://doi.org/10.1016/s0004-3702(01)00166-7
    9 https://doi.org/10.1137/s0097539701398375
    10 https://doi.org/10.1287/opre.1040.0145
    11 schema:datePublished 2006
    12 schema:datePublishedReg 2006-01-01
    13 schema:description For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.
    14 schema:editor N2c9e446ea1a247739d22594cbb190e1e
    15 schema:genre chapter
    16 schema:inLanguage en
    17 schema:isAccessibleForFree true
    18 schema:isPartOf N5195a87efd054590af7c04abd50fa539
    19 schema:name Bandit Based Monte-Carlo Planning
    20 schema:pagination 282-293
    21 schema:productId N4b22d2341c7c4aea83cc1397df60e7bc
    22 N76409ac028524becba039b6b34841903
    23 N8f98c9de71ca4371b03e8e70d61bae66
    24 schema:publisher Ncf9c3f487ffd476281e73cec57bb0e4c
    25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037233025
    26 https://doi.org/10.1007/11871842_29
    27 schema:sdDatePublished 2019-04-16T07:26
    28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    29 schema:sdPublisher N0f47c29e8d804fc4a27382bcdf998758
    30 schema:url https://link.springer.com/10.1007%2F11871842_29
    31 sgo:license sg:explorer/license/
    32 sgo:sdDataset chapters
    33 rdf:type schema:Chapter
    34 N00d195e6f31f4c56b66272aada6124ba rdf:first N3029305296074c6fb7dc48880f8a4099
    35 rdf:rest N59fd658466eb4891b9ac42a808dabd99
    36 N0f47c29e8d804fc4a27382bcdf998758 schema:name Springer Nature - SN SciGraph project
    37 rdf:type schema:Organization
    38 N2c9e446ea1a247739d22594cbb190e1e rdf:first N44e83bf9107845088ae30756b98fe515
    39 rdf:rest N00d195e6f31f4c56b66272aada6124ba
    40 N3029305296074c6fb7dc48880f8a4099 schema:familyName Scheffer
    41 schema:givenName Tobias
    42 rdf:type schema:Person
    43 N44e83bf9107845088ae30756b98fe515 schema:familyName Fürnkranz
    44 schema:givenName Johannes
    45 rdf:type schema:Person
    46 N469e75d7ef82492089eef06908192d5f rdf:first sg:person.07770302215.82
    47 rdf:rest Nfd8def63b9844054814a0e842c715fda
    48 N4b22d2341c7c4aea83cc1397df60e7bc schema:name readcube_id
    49 schema:value b15d6ca448e1b85b9b4203039d8fb0f7baba57849020f5d148efc72945f096d1
    50 rdf:type schema:PropertyValue
    51 N5195a87efd054590af7c04abd50fa539 schema:isbn 978-3-540-45375-8
    52 978-3-540-46056-5
    53 schema:name Machine Learning: ECML 2006
    54 rdf:type schema:Book
    55 N59fd658466eb4891b9ac42a808dabd99 rdf:first N725fe3f74c4041da86c230df1fe352a7
    56 rdf:rest rdf:nil
    57 N725fe3f74c4041da86c230df1fe352a7 schema:familyName Spiliopoulou
    58 schema:givenName Myra
    59 rdf:type schema:Person
    60 N76409ac028524becba039b6b34841903 schema:name dimensions_id
    61 schema:value pub.1037233025
    62 rdf:type schema:PropertyValue
    63 N8f98c9de71ca4371b03e8e70d61bae66 schema:name doi
    64 schema:value 10.1007/11871842_29
    65 rdf:type schema:PropertyValue
    66 Ncf9c3f487ffd476281e73cec57bb0e4c schema:location Berlin, Heidelberg
    67 schema:name Springer Berlin Heidelberg
    68 rdf:type schema:Organisation
    69 Nfd8def63b9844054814a0e842c715fda rdf:first sg:person.016202177221.23
    70 rdf:rest rdf:nil
    71 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    72 schema:name Mathematical Sciences
    73 rdf:type schema:DefinedTerm
    74 anzsrc-for:0102 schema:inDefinedTermSet anzsrc-for:
    75 schema:name Applied Mathematics
    76 rdf:type schema:DefinedTerm
    77 sg:person.016202177221.23 schema:affiliation https://www.grid.ac/institutes/grid.4836.9
    78 schema:familyName Szepesvári
    79 schema:givenName Csaba
    80 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23
    81 rdf:type schema:Person
    82 sg:person.07770302215.82 schema:affiliation https://www.grid.ac/institutes/grid.4836.9
    83 schema:familyName Kocsis
    84 schema:givenName Levente
    85 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07770302215.82
    86 rdf:type schema:Person
    87 sg:pub.10.1007/978-0-387-35706-5_11 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044078476
    88 https://doi.org/10.1007/978-0-387-35706-5_11
    89 rdf:type schema:CreativeWork
    90 sg:pub.10.1023/a:1013689704352 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039349898
    91 https://doi.org/10.1023/a:1013689704352
    92 rdf:type schema:CreativeWork
    93 https://doi.org/10.1016/0196-8858(85)90002-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033673111
    94 rdf:type schema:CreativeWork
    95 https://doi.org/10.1016/s0004-3702(01)00130-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044471374
    96 rdf:type schema:CreativeWork
    97 https://doi.org/10.1016/s0004-3702(01)00166-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010924070
    98 rdf:type schema:CreativeWork
    99 https://doi.org/10.1137/s0097539701398375 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062879339
    100 rdf:type schema:CreativeWork
    101 https://doi.org/10.1287/opre.1040.0145 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064725620
    102 rdf:type schema:CreativeWork
    103 https://www.grid.ac/institutes/grid.4836.9 schema:alternateName MTA Institute for Computer Science and Control
    104 schema:name Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111, Budapest, Hungary
    105 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...