Deviations of Stochastic Bandit Regret View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2011

AUTHORS

Antoine Salomon , Jean-Yves Audibert

ABSTRACT

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order logn. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms. More... »

PAGES

159-173

References to SciGraph publications

  • 2006. Bandit Based Monte-Carlo Planning in MACHINE LEARNING: ECML 2006
  • 2008. Bandit Problems in THE NEW PALGRAVE DICTIONARY OF ECONOMICS
  • 2002-05. Finite-time Analysis of the Multiarmed Bandit Problem in MACHINE LEARNING
  • Book

    TITLE

    Algorithmic Learning Theory

    ISBN

    978-3-642-24411-7
    978-3-642-24412-4

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-642-24412-4_15

    DOI

    http://dx.doi.org/10.1007/978-3-642-24412-4_15

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1050961054


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1605", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Policy and Administration", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/16", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Studies in Human Society", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Laboratoire d'Informatique Gaspard-Monge", 
              "id": "https://www.grid.ac/institutes/grid.462940.d", 
              "name": [
                "Imagine, LIGM, \u00c9cole des Ponts ParisTech Universit\u00e9 Paris Est, France"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Salomon", 
            "givenName": "Antoine", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "French National Centre for Scientific Research", 
              "id": "https://www.grid.ac/institutes/grid.4444.0", 
              "name": [
                "Imagine, LIGM, \u00c9cole des Ponts ParisTech Universit\u00e9 Paris Est, France", 
                "Sierra, CNRS/ENS/INRIA, Paris, France"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Audibert", 
            "givenName": "Jean-Yves", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1145/1566374.1566386", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1001774469"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/1374376.1374475", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004451711"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1057/978-1-349-95121-5_2386-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1021542300", 
              "https://doi.org/10.1057/978-1-349-95121-5_2386-1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.tcs.2009.01.016", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032851363"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0196-8858(85)90002-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033673111"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11871842_29", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037233025", 
              "https://doi.org/10.1007/11871842_29"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11871842_29", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037233025", 
              "https://doi.org/10.1007/11871842_29"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1090/s0002-9904-1952-09620-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037264252"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/1566374.1566388", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037453164"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1013689704352", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039349898", 
              "https://doi.org/10.1023/a:1013689704352"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1214/105051604000000350", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1064391735"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1214/aop/1176990746", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1064403995"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.2307/1427934", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1069490858"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://app.dimensions.ai/details/publication/pub.1111345088", 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2011", 
        "datePublishedReg": "2011-01-01", 
        "description": "This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order logn. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.", 
        "editor": [
          {
            "familyName": "Kivinen", 
            "givenName": "Jyrki", 
            "type": "Person"
          }, 
          {
            "familyName": "Szepesv\u00e1ri", 
            "givenName": "Csaba", 
            "type": "Person"
          }, 
          {
            "familyName": "Ukkonen", 
            "givenName": "Esko", 
            "type": "Person"
          }, 
          {
            "familyName": "Zeugmann", 
            "givenName": "Thomas", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-642-24412-4_15", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-642-24411-7", 
            "978-3-642-24412-4"
          ], 
          "name": "Algorithmic Learning Theory", 
          "type": "Book"
        }, 
        "name": "Deviations of Stochastic Bandit Regret", 
        "pagination": "159-173", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1050961054"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-642-24412-4_15"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "f2c2af5f609f5f3f9575b5dd53a287601f4da4b8f2d51f35d330b7a96940f579"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-642-24412-4_15", 
          "https://app.dimensions.ai/details/publication/pub.1050961054"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T09:45", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000375_0000000375/records_91453_00000001.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F978-3-642-24412-4_15"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-24412-4_15'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-24412-4_15'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-24412-4_15'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-24412-4_15'


     

    This table displays all metadata directly associated to this object as RDF triples.

    130 TRIPLES      23 PREDICATES      40 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-642-24412-4_15 schema:about anzsrc-for:16
    2 anzsrc-for:1605
    3 schema:author N2508c12b498e4247ae9389ad7ba0cbf7
    4 schema:citation sg:pub.10.1007/11871842_29
    5 sg:pub.10.1023/a:1013689704352
    6 sg:pub.10.1057/978-1-349-95121-5_2386-1
    7 https://app.dimensions.ai/details/publication/pub.1111345088
    8 https://doi.org/10.1016/0196-8858(85)90002-8
    9 https://doi.org/10.1016/j.tcs.2009.01.016
    10 https://doi.org/10.1090/s0002-9904-1952-09620-8
    11 https://doi.org/10.1145/1374376.1374475
    12 https://doi.org/10.1145/1566374.1566386
    13 https://doi.org/10.1145/1566374.1566388
    14 https://doi.org/10.1214/105051604000000350
    15 https://doi.org/10.1214/aop/1176990746
    16 https://doi.org/10.2307/1427934
    17 schema:datePublished 2011
    18 schema:datePublishedReg 2011-01-01
    19 schema:description This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order logn. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.
    20 schema:editor N570396e9da504b7e804bbaa0184b432d
    21 schema:genre chapter
    22 schema:inLanguage en
    23 schema:isAccessibleForFree true
    24 schema:isPartOf Nca5aa19ea8b545448abfc1a1d45e0698
    25 schema:name Deviations of Stochastic Bandit Regret
    26 schema:pagination 159-173
    27 schema:productId N35ae972b4bab4056be97fa5bb6bde799
    28 N819419707b5b43148338ef6af97c3b39
    29 N8ca959f7604e41f28363aee3fb2b7c1a
    30 schema:publisher N07d831b3275d4ab6a4ae0b5b60c5694f
    31 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050961054
    32 https://doi.org/10.1007/978-3-642-24412-4_15
    33 schema:sdDatePublished 2019-04-16T09:45
    34 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    35 schema:sdPublisher N70db5de6133648f38c3715c60b189d2b
    36 schema:url https://link.springer.com/10.1007%2F978-3-642-24412-4_15
    37 sgo:license sg:explorer/license/
    38 sgo:sdDataset chapters
    39 rdf:type schema:Chapter
    40 N057649ae4c1e4671aac2d3992fa9ce89 schema:familyName Zeugmann
    41 schema:givenName Thomas
    42 rdf:type schema:Person
    43 N07d831b3275d4ab6a4ae0b5b60c5694f schema:location Berlin, Heidelberg
    44 schema:name Springer Berlin Heidelberg
    45 rdf:type schema:Organisation
    46 N1a0735eac7d240c98f922d52c5b24253 schema:familyName Ukkonen
    47 schema:givenName Esko
    48 rdf:type schema:Person
    49 N2508c12b498e4247ae9389ad7ba0cbf7 rdf:first N923b0bd268b64587bdd6f6f9fdcc35bd
    50 rdf:rest N6c771d5861e64601abfe3b0c1ca3baad
    51 N35ae972b4bab4056be97fa5bb6bde799 schema:name dimensions_id
    52 schema:value pub.1050961054
    53 rdf:type schema:PropertyValue
    54 N3acf6a076b06444f97966f24b4f4e651 schema:affiliation https://www.grid.ac/institutes/grid.4444.0
    55 schema:familyName Audibert
    56 schema:givenName Jean-Yves
    57 rdf:type schema:Person
    58 N570396e9da504b7e804bbaa0184b432d rdf:first Ndac6ff44731d445e85faf764f788a9b6
    59 rdf:rest N7aa4b0ed9a0443b98c4efac70084aa3f
    60 N6c771d5861e64601abfe3b0c1ca3baad rdf:first N3acf6a076b06444f97966f24b4f4e651
    61 rdf:rest rdf:nil
    62 N70db5de6133648f38c3715c60b189d2b schema:name Springer Nature - SN SciGraph project
    63 rdf:type schema:Organization
    64 N7aa4b0ed9a0443b98c4efac70084aa3f rdf:first N989b9d43d45e422f92226fb90e71f97e
    65 rdf:rest Ndfdde46554074bba8dc903d136eeeb27
    66 N819419707b5b43148338ef6af97c3b39 schema:name doi
    67 schema:value 10.1007/978-3-642-24412-4_15
    68 rdf:type schema:PropertyValue
    69 N8ca959f7604e41f28363aee3fb2b7c1a schema:name readcube_id
    70 schema:value f2c2af5f609f5f3f9575b5dd53a287601f4da4b8f2d51f35d330b7a96940f579
    71 rdf:type schema:PropertyValue
    72 N923b0bd268b64587bdd6f6f9fdcc35bd schema:affiliation https://www.grid.ac/institutes/grid.462940.d
    73 schema:familyName Salomon
    74 schema:givenName Antoine
    75 rdf:type schema:Person
    76 N989b9d43d45e422f92226fb90e71f97e schema:familyName Szepesvári
    77 schema:givenName Csaba
    78 rdf:type schema:Person
    79 Nca5aa19ea8b545448abfc1a1d45e0698 schema:isbn 978-3-642-24411-7
    80 978-3-642-24412-4
    81 schema:name Algorithmic Learning Theory
    82 rdf:type schema:Book
    83 Ndac6ff44731d445e85faf764f788a9b6 schema:familyName Kivinen
    84 schema:givenName Jyrki
    85 rdf:type schema:Person
    86 Ndfdde46554074bba8dc903d136eeeb27 rdf:first N1a0735eac7d240c98f922d52c5b24253
    87 rdf:rest Nf0d13b5f27cb4b8b82d0a4cdf94273b2
    88 Nf0d13b5f27cb4b8b82d0a4cdf94273b2 rdf:first N057649ae4c1e4671aac2d3992fa9ce89
    89 rdf:rest rdf:nil
    90 anzsrc-for:16 schema:inDefinedTermSet anzsrc-for:
    91 schema:name Studies in Human Society
    92 rdf:type schema:DefinedTerm
    93 anzsrc-for:1605 schema:inDefinedTermSet anzsrc-for:
    94 schema:name Policy and Administration
    95 rdf:type schema:DefinedTerm
    96 sg:pub.10.1007/11871842_29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037233025
    97 https://doi.org/10.1007/11871842_29
    98 rdf:type schema:CreativeWork
    99 sg:pub.10.1023/a:1013689704352 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039349898
    100 https://doi.org/10.1023/a:1013689704352
    101 rdf:type schema:CreativeWork
    102 sg:pub.10.1057/978-1-349-95121-5_2386-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021542300
    103 https://doi.org/10.1057/978-1-349-95121-5_2386-1
    104 rdf:type schema:CreativeWork
    105 https://app.dimensions.ai/details/publication/pub.1111345088 schema:CreativeWork
    106 https://doi.org/10.1016/0196-8858(85)90002-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033673111
    107 rdf:type schema:CreativeWork
    108 https://doi.org/10.1016/j.tcs.2009.01.016 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032851363
    109 rdf:type schema:CreativeWork
    110 https://doi.org/10.1090/s0002-9904-1952-09620-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037264252
    111 rdf:type schema:CreativeWork
    112 https://doi.org/10.1145/1374376.1374475 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004451711
    113 rdf:type schema:CreativeWork
    114 https://doi.org/10.1145/1566374.1566386 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001774469
    115 rdf:type schema:CreativeWork
    116 https://doi.org/10.1145/1566374.1566388 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037453164
    117 rdf:type schema:CreativeWork
    118 https://doi.org/10.1214/105051604000000350 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064391735
    119 rdf:type schema:CreativeWork
    120 https://doi.org/10.1214/aop/1176990746 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064403995
    121 rdf:type schema:CreativeWork
    122 https://doi.org/10.2307/1427934 schema:sameAs https://app.dimensions.ai/details/publication/pub.1069490858
    123 rdf:type schema:CreativeWork
    124 https://www.grid.ac/institutes/grid.4444.0 schema:alternateName French National Centre for Scientific Research
    125 schema:name Imagine, LIGM, École des Ponts ParisTech Université Paris Est, France
    126 Sierra, CNRS/ENS/INRIA, Paris, France
    127 rdf:type schema:Organization
    128 https://www.grid.ac/institutes/grid.462940.d schema:alternateName Laboratoire d'Informatique Gaspard-Monge
    129 schema:name Imagine, LIGM, École des Ponts ParisTech Université Paris Est, France
    130 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...