Tuning Bandit Algorithms in Stochastic Environments View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2007

AUTHORS

Jean-Yves Audibert , Rémi Munos , Csaba Szepesvári

ABSTRACT

Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except for some very special bandit problems, the regret, for upper confidence bounds based algorithms with standard bias sequences, concentrates only at a polynomial rate. Hence, although these algorithms achieve logarithmic expected regret rates, they seem less attractive when the risk of suffering much worse than logarithmic regret is also taken into account. More... »

PAGES

150-165

References to SciGraph publications

Book

TITLE

Algorithmic Learning Theory

ISBN

978-3-540-75224-0
978-3-540-75225-7

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-75225-7_15

DOI

http://dx.doi.org/10.1007/978-3-540-75225-7_15

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1029155399


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Computation Theory and Mathematics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "name": [
            "CERTIS - Ecole des Ponts, 19, rue Alfred Nobel - Cit\u00e9 Descartes, 77455 Marne-la-Vall\u00e9e, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Audibert", 
        "givenName": "Jean-Yves", 
        "id": "sg:person.013551162107.15", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013551162107.15"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "INRIA Futurs Lille, SequeL project, 40 avenue Halley, 59650 Villeneuve d\u2019Ascq, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Munos", 
        "givenName": "R\u00e9mi", 
        "id": "sg:person.015325733353.19", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015325733353.19"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Alberta", 
          "id": "https://www.grid.ac/institutes/grid.17089.37", 
          "name": [
            "University of Alberta, Edmonton T6G 2E8, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Szepesv\u00e1ri", 
        "givenName": "Csaba", 
        "id": "sg:person.016202177221.23", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/0196-8858(85)90002-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033673111"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1090/s0002-9904-1952-09620-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037264252"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1013689704352", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039349898", 
          "https://doi.org/10.1023/a:1013689704352"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/biomet/25.3-4.285", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1059415697"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/9.400491", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061244526"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.2307/1427934", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1069490858"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2007", 
    "datePublishedReg": "2007-01-01", 
    "description": "Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except for some very special bandit problems, the regret, for upper confidence bounds based algorithms with standard bias sequences, concentrates only at a polynomial rate. Hence, although these algorithms achieve logarithmic expected regret rates, they seem less attractive when the risk of suffering much worse than logarithmic regret is also taken into account.", 
    "editor": [
      {
        "familyName": "Hutter", 
        "givenName": "Marcus", 
        "type": "Person"
      }, 
      {
        "familyName": "Servedio", 
        "givenName": "Rocco A.", 
        "type": "Person"
      }, 
      {
        "familyName": "Takimoto", 
        "givenName": "Eiji", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-75225-7_15", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-75224-0", 
        "978-3-540-75225-7"
      ], 
      "name": "Algorithmic Learning Theory", 
      "type": "Book"
    }, 
    "name": "Tuning Bandit Algorithms in Stochastic Environments", 
    "pagination": "150-165", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-75225-7_15"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "8ae44f6d7778dcea1a5d17695ead4b75d7e898862aebc61930c814bb0c259647"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1029155399"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-75225-7_15", 
      "https://app.dimensions.ai/details/publication/pub.1029155399"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T05:30", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000346_0000000346/records_99806_00000002.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-3-540-75225-7_15"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-75225-7_15'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-75225-7_15'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-75225-7_15'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-75225-7_15'


 

This table displays all metadata directly associated to this object as RDF triples.

112 TRIPLES      23 PREDICATES      33 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-75225-7_15 schema:about anzsrc-for:08
2 anzsrc-for:0802
3 schema:author N960ebdf751c24b0689a743359732b648
4 schema:citation sg:pub.10.1023/a:1013689704352
5 https://doi.org/10.1016/0196-8858(85)90002-8
6 https://doi.org/10.1090/s0002-9904-1952-09620-8
7 https://doi.org/10.1093/biomet/25.3-4.285
8 https://doi.org/10.1109/9.400491
9 https://doi.org/10.2307/1427934
10 schema:datePublished 2007
11 schema:datePublishedReg 2007-01-01
12 schema:description Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except for some very special bandit problems, the regret, for upper confidence bounds based algorithms with standard bias sequences, concentrates only at a polynomial rate. Hence, although these algorithms achieve logarithmic expected regret rates, they seem less attractive when the risk of suffering much worse than logarithmic regret is also taken into account.
13 schema:editor Nadabda8f4c564b1bb72aa723a2b3aec8
14 schema:genre chapter
15 schema:inLanguage en
16 schema:isAccessibleForFree true
17 schema:isPartOf N71848f32548448ee87020a2d1a41b3b6
18 schema:name Tuning Bandit Algorithms in Stochastic Environments
19 schema:pagination 150-165
20 schema:productId N31e50cdb55ef46d4a5ad2175c76c6e34
21 N8ca23ab2b59a42f381191b9a7f3ae7b4
22 Nb96afc88182d46d7bad76f32548f6bb6
23 schema:publisher N0a6e8366ba354400bfc18c27b5a2b510
24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029155399
25 https://doi.org/10.1007/978-3-540-75225-7_15
26 schema:sdDatePublished 2019-04-16T05:30
27 schema:sdLicense https://scigraph.springernature.com/explorer/license/
28 schema:sdPublisher N2bc1b18e4e5e4d71abf0b6a8bb809f07
29 schema:url https://link.springer.com/10.1007%2F978-3-540-75225-7_15
30 sgo:license sg:explorer/license/
31 sgo:sdDataset chapters
32 rdf:type schema:Chapter
33 N0a6e8366ba354400bfc18c27b5a2b510 schema:location Berlin, Heidelberg
34 schema:name Springer Berlin Heidelberg
35 rdf:type schema:Organisation
36 N12fd76a24b464650aa8e3c5457fb61c8 rdf:first Na6f5b8765df34fb1a283fc4a10fdae4e
37 rdf:rest rdf:nil
38 N2bc1b18e4e5e4d71abf0b6a8bb809f07 schema:name Springer Nature - SN SciGraph project
39 rdf:type schema:Organization
40 N31e50cdb55ef46d4a5ad2175c76c6e34 schema:name readcube_id
41 schema:value 8ae44f6d7778dcea1a5d17695ead4b75d7e898862aebc61930c814bb0c259647
42 rdf:type schema:PropertyValue
43 N346755388aac4010994ea34059405b32 rdf:first Nfb21bf2593674cd29efe9f1a036f8521
44 rdf:rest N12fd76a24b464650aa8e3c5457fb61c8
45 N71848f32548448ee87020a2d1a41b3b6 schema:isbn 978-3-540-75224-0
46 978-3-540-75225-7
47 schema:name Algorithmic Learning Theory
48 rdf:type schema:Book
49 N8ca23ab2b59a42f381191b9a7f3ae7b4 schema:name doi
50 schema:value 10.1007/978-3-540-75225-7_15
51 rdf:type schema:PropertyValue
52 N960ebdf751c24b0689a743359732b648 rdf:first sg:person.013551162107.15
53 rdf:rest Ne0adcd7073d949bab4d096271b60f21f
54 Na6f5b8765df34fb1a283fc4a10fdae4e schema:familyName Takimoto
55 schema:givenName Eiji
56 rdf:type schema:Person
57 Nadabda8f4c564b1bb72aa723a2b3aec8 rdf:first Nf6e75378f63f412aade641ba1ca1228f
58 rdf:rest N346755388aac4010994ea34059405b32
59 Nb96afc88182d46d7bad76f32548f6bb6 schema:name dimensions_id
60 schema:value pub.1029155399
61 rdf:type schema:PropertyValue
62 Nbad8d3c3ec38406d86c01c0089c8826e rdf:first sg:person.016202177221.23
63 rdf:rest rdf:nil
64 Ne0adcd7073d949bab4d096271b60f21f rdf:first sg:person.015325733353.19
65 rdf:rest Nbad8d3c3ec38406d86c01c0089c8826e
66 Nee5222ab50cb432191444029a560a7ed schema:name INRIA Futurs Lille, SequeL project, 40 avenue Halley, 59650 Villeneuve d’Ascq, France
67 rdf:type schema:Organization
68 Nee8fc103dd8949f3b0b2713d86db7f4d schema:name CERTIS - Ecole des Ponts, 19, rue Alfred Nobel - Cité Descartes, 77455 Marne-la-Vallée, France
69 rdf:type schema:Organization
70 Nf6e75378f63f412aade641ba1ca1228f schema:familyName Hutter
71 schema:givenName Marcus
72 rdf:type schema:Person
73 Nfb21bf2593674cd29efe9f1a036f8521 schema:familyName Servedio
74 schema:givenName Rocco A.
75 rdf:type schema:Person
76 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
77 schema:name Information and Computing Sciences
78 rdf:type schema:DefinedTerm
79 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
80 schema:name Computation Theory and Mathematics
81 rdf:type schema:DefinedTerm
82 sg:person.013551162107.15 schema:affiliation Nee8fc103dd8949f3b0b2713d86db7f4d
83 schema:familyName Audibert
84 schema:givenName Jean-Yves
85 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013551162107.15
86 rdf:type schema:Person
87 sg:person.015325733353.19 schema:affiliation Nee5222ab50cb432191444029a560a7ed
88 schema:familyName Munos
89 schema:givenName Rémi
90 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015325733353.19
91 rdf:type schema:Person
92 sg:person.016202177221.23 schema:affiliation https://www.grid.ac/institutes/grid.17089.37
93 schema:familyName Szepesvári
94 schema:givenName Csaba
95 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016202177221.23
96 rdf:type schema:Person
97 sg:pub.10.1023/a:1013689704352 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039349898
98 https://doi.org/10.1023/a:1013689704352
99 rdf:type schema:CreativeWork
100 https://doi.org/10.1016/0196-8858(85)90002-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033673111
101 rdf:type schema:CreativeWork
102 https://doi.org/10.1090/s0002-9904-1952-09620-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037264252
103 rdf:type schema:CreativeWork
104 https://doi.org/10.1093/biomet/25.3-4.285 schema:sameAs https://app.dimensions.ai/details/publication/pub.1059415697
105 rdf:type schema:CreativeWork
106 https://doi.org/10.1109/9.400491 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061244526
107 rdf:type schema:CreativeWork
108 https://doi.org/10.2307/1427934 schema:sameAs https://app.dimensions.ai/details/publication/pub.1069490858
109 rdf:type schema:CreativeWork
110 https://www.grid.ac/institutes/grid.17089.37 schema:alternateName University of Alberta
111 schema:name University of Alberta, Edmonton T6G 2E8, Canada
112 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...