Finite-time Analysis of the Multiarmed Bandit Problem View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2002-05

AUTHORS

Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer

ABSTRACT

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. More... »

PAGES

235-256

References to SciGraph publications

  • 1994-10. Multi-Armed bandit problem revisited in JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS
  • 1991-12. Nonparametric bandit methods in ANNALS OF OPERATIONS RESEARCH
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1023/a:1013689704352

    DOI

    http://dx.doi.org/10.1023/a:1013689704352

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1039349898


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1402", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Applied Economics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/14", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Economics", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Graz University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.410413.3", 
              "name": [
                "University of Technology Graz, A-8010, Graz, Austria"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Auer", 
            "givenName": "Peter", 
            "id": "sg:person.010211007377.90", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010211007377.90"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Milan", 
              "id": "https://www.grid.ac/institutes/grid.4708.b", 
              "name": [
                "DTI, University of Milan, via Bramante 65, I-26013, Crema, Italy"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Cesa-Bianchi", 
            "givenName": "Nicol\u00f2", 
            "id": "sg:person.015724422615.19", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015724422615.19"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "TU Dortmund University", 
              "id": "https://www.grid.ac/institutes/grid.5675.1", 
              "name": [
                "Lehrstuhl Informatik II, Universit\u00e4t Dortmund, D-44221, Dortmund, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Fischer", 
            "givenName": "Paul", 
            "id": "sg:person.015777522071.34", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015777522071.34"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1016/b978-1-55860-377-6.50034-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008169894"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0196-8858(85)90002-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033673111"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1006/aama.1996.0007", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033676178"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf02055587", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039636640", 
              "https://doi.org/10.1007/bf02055587"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf02055587", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039636640", 
              "https://doi.org/10.1007/bf02055587"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf02191765", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043110808", 
              "https://doi.org/10.1007/bf02191765"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf02191765", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043110808", 
              "https://doi.org/10.1007/bf02191765"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2002-05", 
        "datePublishedReg": "2002-05-01", 
        "description": "Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1023/a:1013689704352", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1125588", 
            "issn": [
              "0885-6125", 
              "1573-0565"
            ], 
            "name": "Machine Learning", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "2-3", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "47"
          }
        ], 
        "name": "Finite-time Analysis of the Multiarmed Bandit Problem", 
        "pagination": "235-256", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "bb716a55cacf51d515f1540655dd41c7f3dc25d84200de109bd9d7974e133c53"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1023/a:1013689704352"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1039349898"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1023/a:1013689704352", 
          "https://app.dimensions.ai/details/publication/pub.1039349898"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-10T19:54", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000500.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "http://link.springer.com/10.1023/A:1013689704352"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1013689704352'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1013689704352'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1013689704352'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1013689704352'


     

    This table displays all metadata directly associated to this object as RDF triples.

    98 TRIPLES      21 PREDICATES      32 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1023/a:1013689704352 schema:about anzsrc-for:14
    2 anzsrc-for:1402
    3 schema:author Nc1f7cf3e52ff47f4887851cbccf89b2f
    4 schema:citation sg:pub.10.1007/bf02055587
    5 sg:pub.10.1007/bf02191765
    6 https://doi.org/10.1006/aama.1996.0007
    7 https://doi.org/10.1016/0196-8858(85)90002-8
    8 https://doi.org/10.1016/b978-1-55860-377-6.50034-7
    9 schema:datePublished 2002-05
    10 schema:datePublishedReg 2002-05-01
    11 schema:description Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
    12 schema:genre research_article
    13 schema:inLanguage en
    14 schema:isAccessibleForFree true
    15 schema:isPartOf N299c76d632e647aeaa471278678ee5cb
    16 Ned8d4a1859e54777b384fa1432120b28
    17 sg:journal.1125588
    18 schema:name Finite-time Analysis of the Multiarmed Bandit Problem
    19 schema:pagination 235-256
    20 schema:productId N261e498ac2854226949b979fac3478af
    21 N81d579584f7142d38460a3999f83592e
    22 Ne9202f775a2840eab2faaf0336060534
    23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039349898
    24 https://doi.org/10.1023/a:1013689704352
    25 schema:sdDatePublished 2019-04-10T19:54
    26 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    27 schema:sdPublisher N01a87fb2a5294fd39abd3170abde8f0a
    28 schema:url http://link.springer.com/10.1023/A:1013689704352
    29 sgo:license sg:explorer/license/
    30 sgo:sdDataset articles
    31 rdf:type schema:ScholarlyArticle
    32 N01a87fb2a5294fd39abd3170abde8f0a schema:name Springer Nature - SN SciGraph project
    33 rdf:type schema:Organization
    34 N261e498ac2854226949b979fac3478af schema:name readcube_id
    35 schema:value bb716a55cacf51d515f1540655dd41c7f3dc25d84200de109bd9d7974e133c53
    36 rdf:type schema:PropertyValue
    37 N299c76d632e647aeaa471278678ee5cb schema:volumeNumber 47
    38 rdf:type schema:PublicationVolume
    39 N3b813ed7f27948e1be10573d6c86fd78 rdf:first sg:person.015724422615.19
    40 rdf:rest Nfd5663bff1964e1890b43c5980dd9463
    41 N81d579584f7142d38460a3999f83592e schema:name dimensions_id
    42 schema:value pub.1039349898
    43 rdf:type schema:PropertyValue
    44 Nc1f7cf3e52ff47f4887851cbccf89b2f rdf:first sg:person.010211007377.90
    45 rdf:rest N3b813ed7f27948e1be10573d6c86fd78
    46 Ne9202f775a2840eab2faaf0336060534 schema:name doi
    47 schema:value 10.1023/a:1013689704352
    48 rdf:type schema:PropertyValue
    49 Ned8d4a1859e54777b384fa1432120b28 schema:issueNumber 2-3
    50 rdf:type schema:PublicationIssue
    51 Nfd5663bff1964e1890b43c5980dd9463 rdf:first sg:person.015777522071.34
    52 rdf:rest rdf:nil
    53 anzsrc-for:14 schema:inDefinedTermSet anzsrc-for:
    54 schema:name Economics
    55 rdf:type schema:DefinedTerm
    56 anzsrc-for:1402 schema:inDefinedTermSet anzsrc-for:
    57 schema:name Applied Economics
    58 rdf:type schema:DefinedTerm
    59 sg:journal.1125588 schema:issn 0885-6125
    60 1573-0565
    61 schema:name Machine Learning
    62 rdf:type schema:Periodical
    63 sg:person.010211007377.90 schema:affiliation https://www.grid.ac/institutes/grid.410413.3
    64 schema:familyName Auer
    65 schema:givenName Peter
    66 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010211007377.90
    67 rdf:type schema:Person
    68 sg:person.015724422615.19 schema:affiliation https://www.grid.ac/institutes/grid.4708.b
    69 schema:familyName Cesa-Bianchi
    70 schema:givenName Nicolò
    71 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015724422615.19
    72 rdf:type schema:Person
    73 sg:person.015777522071.34 schema:affiliation https://www.grid.ac/institutes/grid.5675.1
    74 schema:familyName Fischer
    75 schema:givenName Paul
    76 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015777522071.34
    77 rdf:type schema:Person
    78 sg:pub.10.1007/bf02055587 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039636640
    79 https://doi.org/10.1007/bf02055587
    80 rdf:type schema:CreativeWork
    81 sg:pub.10.1007/bf02191765 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043110808
    82 https://doi.org/10.1007/bf02191765
    83 rdf:type schema:CreativeWork
    84 https://doi.org/10.1006/aama.1996.0007 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033676178
    85 rdf:type schema:CreativeWork
    86 https://doi.org/10.1016/0196-8858(85)90002-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033673111
    87 rdf:type schema:CreativeWork
    88 https://doi.org/10.1016/b978-1-55860-377-6.50034-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008169894
    89 rdf:type schema:CreativeWork
    90 https://www.grid.ac/institutes/grid.410413.3 schema:alternateName Graz University of Technology
    91 schema:name University of Technology Graz, A-8010, Graz, Austria
    92 rdf:type schema:Organization
    93 https://www.grid.ac/institutes/grid.4708.b schema:alternateName University of Milan
    94 schema:name DTI, University of Milan, via Bramante 65, I-26013, Crema, Italy
    95 rdf:type schema:Organization
    96 https://www.grid.ac/institutes/grid.5675.1 schema:alternateName TU Dortmund University
    97 schema:name Lehrstuhl Informatik II, Universität Dortmund, D-44221, Dortmund, Germany
    98 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...