Finite-time Analysis of the Multiarmed Bandit Problem View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2002-05

AUTHORS

Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer

ABSTRACT

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. More... »

PAGES

235-256

Identifiers

URI

http://scigraph.springernature.com/pub.10.1023/a:1013689704352

DOI

http://dx.doi.org/10.1023/a:1013689704352

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1039349898


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1402", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Applied Economics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/14", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Economics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Graz University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.410413.3", 
          "name": [
            "University of Technology Graz, A-8010, Graz, Austria"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Auer", 
        "givenName": "Peter", 
        "id": "sg:person.010211007377.90", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010211007377.90"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Milan", 
          "id": "https://www.grid.ac/institutes/grid.4708.b", 
          "name": [
            "DTI, University of Milan, via Bramante 65, I-26013, Crema, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cesa-Bianchi", 
        "givenName": "Nicol\u00f2", 
        "id": "sg:person.015724422615.19", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015724422615.19"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "TU Dortmund University", 
          "id": "https://www.grid.ac/institutes/grid.5675.1", 
          "name": [
            "Lehrstuhl Informatik II, Universit\u00e4t Dortmund, D-44221, Dortmund, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fischer", 
        "givenName": "Paul", 
        "id": "sg:person.015777522071.34", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015777522071.34"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/b978-1-55860-377-6.50034-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008169894"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0196-8858(85)90002-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033673111"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/aama.1996.0007", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033676178"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf02055587", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039636640", 
          "https://doi.org/10.1007/bf02055587"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf02055587", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039636640", 
          "https://doi.org/10.1007/bf02055587"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf02191765", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043110808", 
          "https://doi.org/10.1007/bf02191765"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf02191765", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043110808", 
          "https://doi.org/10.1007/bf02191765"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2002-05", 
    "datePublishedReg": "2002-05-01", 
    "description": "Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1023/a:1013689704352", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1125588", 
        "issn": [
          "0885-6125", 
          "1573-0565"
        ], 
        "name": "Machine Learning", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "2-3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "47"
      }
    ], 
    "name": "Finite-time Analysis of the Multiarmed Bandit Problem", 
    "pagination": "235-256", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "bb716a55cacf51d515f1540655dd41c7f3dc25d84200de109bd9d7974e133c53"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1023/a:1013689704352"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1039349898"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1023/a:1013689704352", 
      "https://app.dimensions.ai/details/publication/pub.1039349898"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T19:54", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000500.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1023/A:1013689704352"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1013689704352'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1013689704352'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1013689704352'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1013689704352'


 

This table displays all metadata directly associated to this object as RDF triples.

98 TRIPLES      21 PREDICATES      32 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1023/a:1013689704352 schema:about anzsrc-for:14
2 anzsrc-for:1402
3 schema:author N540be14c1a12417b9315dd3e00585e8e
4 schema:citation sg:pub.10.1007/bf02055587
5 sg:pub.10.1007/bf02191765
6 https://doi.org/10.1006/aama.1996.0007
7 https://doi.org/10.1016/0196-8858(85)90002-8
8 https://doi.org/10.1016/b978-1-55860-377-6.50034-7
9 schema:datePublished 2002-05
10 schema:datePublishedReg 2002-05-01
11 schema:description Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
12 schema:genre research_article
13 schema:inLanguage en
14 schema:isAccessibleForFree true
15 schema:isPartOf N083ac16781954e20bcfc1f32285813f1
16 Naad4228df9764fc59c83d6f06ec92c0d
17 sg:journal.1125588
18 schema:name Finite-time Analysis of the Multiarmed Bandit Problem
19 schema:pagination 235-256
20 schema:productId N3029f54157794f8a94d591505f7c0528
21 N46a2034a9795462e90c3983f20977a64
22 N607ec9848f1447d6a6fd20202e1b3906
23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039349898
24 https://doi.org/10.1023/a:1013689704352
25 schema:sdDatePublished 2019-04-10T19:54
26 schema:sdLicense https://scigraph.springernature.com/explorer/license/
27 schema:sdPublisher N10df954d10614f82b27a648bf5599a4e
28 schema:url http://link.springer.com/10.1023/A:1013689704352
29 sgo:license sg:explorer/license/
30 sgo:sdDataset articles
31 rdf:type schema:ScholarlyArticle
32 N083ac16781954e20bcfc1f32285813f1 schema:issueNumber 2-3
33 rdf:type schema:PublicationIssue
34 N10df954d10614f82b27a648bf5599a4e schema:name Springer Nature - SN SciGraph project
35 rdf:type schema:Organization
36 N3029f54157794f8a94d591505f7c0528 schema:name readcube_id
37 schema:value bb716a55cacf51d515f1540655dd41c7f3dc25d84200de109bd9d7974e133c53
38 rdf:type schema:PropertyValue
39 N46a2034a9795462e90c3983f20977a64 schema:name doi
40 schema:value 10.1023/a:1013689704352
41 rdf:type schema:PropertyValue
42 N540be14c1a12417b9315dd3e00585e8e rdf:first sg:person.010211007377.90
43 rdf:rest N6a85bdace1f441ca902758804cdfef45
44 N607ec9848f1447d6a6fd20202e1b3906 schema:name dimensions_id
45 schema:value pub.1039349898
46 rdf:type schema:PropertyValue
47 N6a85bdace1f441ca902758804cdfef45 rdf:first sg:person.015724422615.19
48 rdf:rest N9bbc16f2b66a407fa6ea4b91c0735d29
49 N9bbc16f2b66a407fa6ea4b91c0735d29 rdf:first sg:person.015777522071.34
50 rdf:rest rdf:nil
51 Naad4228df9764fc59c83d6f06ec92c0d schema:volumeNumber 47
52 rdf:type schema:PublicationVolume
53 anzsrc-for:14 schema:inDefinedTermSet anzsrc-for:
54 schema:name Economics
55 rdf:type schema:DefinedTerm
56 anzsrc-for:1402 schema:inDefinedTermSet anzsrc-for:
57 schema:name Applied Economics
58 rdf:type schema:DefinedTerm
59 sg:journal.1125588 schema:issn 0885-6125
60 1573-0565
61 schema:name Machine Learning
62 rdf:type schema:Periodical
63 sg:person.010211007377.90 schema:affiliation https://www.grid.ac/institutes/grid.410413.3
64 schema:familyName Auer
65 schema:givenName Peter
66 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010211007377.90
67 rdf:type schema:Person
68 sg:person.015724422615.19 schema:affiliation https://www.grid.ac/institutes/grid.4708.b
69 schema:familyName Cesa-Bianchi
70 schema:givenName Nicolò
71 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015724422615.19
72 rdf:type schema:Person
73 sg:person.015777522071.34 schema:affiliation https://www.grid.ac/institutes/grid.5675.1
74 schema:familyName Fischer
75 schema:givenName Paul
76 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015777522071.34
77 rdf:type schema:Person
78 sg:pub.10.1007/bf02055587 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039636640
79 https://doi.org/10.1007/bf02055587
80 rdf:type schema:CreativeWork
81 sg:pub.10.1007/bf02191765 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043110808
82 https://doi.org/10.1007/bf02191765
83 rdf:type schema:CreativeWork
84 https://doi.org/10.1006/aama.1996.0007 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033676178
85 rdf:type schema:CreativeWork
86 https://doi.org/10.1016/0196-8858(85)90002-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033673111
87 rdf:type schema:CreativeWork
88 https://doi.org/10.1016/b978-1-55860-377-6.50034-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008169894
89 rdf:type schema:CreativeWork
90 https://www.grid.ac/institutes/grid.410413.3 schema:alternateName Graz University of Technology
91 schema:name University of Technology Graz, A-8010, Graz, Austria
92 rdf:type schema:Organization
93 https://www.grid.ac/institutes/grid.4708.b schema:alternateName University of Milan
94 schema:name DTI, University of Milan, via Bramante 65, I-26013, Crema, Italy
95 rdf:type schema:Organization
96 https://www.grid.ac/institutes/grid.5675.1 schema:alternateName TU Dortmund University
97 schema:name Lehrstuhl Informatik II, Universität Dortmund, D-44221, Dortmund, Germany
98 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...