Wallenius Bayes View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2018-06

AUTHORS

Enric Junqué de Fortuny, David Martens, Foster Provost

ABSTRACT

This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice. More... »

PAGES

1013-1037

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z

DOI

http://dx.doi.org/10.1007/s10994-018-5699-z

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1101176965


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "New York University Shanghai", 
          "id": "https://www.grid.ac/institutes/grid.449457.f", 
          "name": [
            "NYU Shanghai, 1555 Century Ave, Shanghai, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Junqu\u00e9 de Fortuny", 
        "givenName": "Enric", 
        "id": "sg:person.012600337251.48", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012600337251.48"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Antwerp", 
          "id": "https://www.grid.ac/institutes/grid.5284.b", 
          "name": [
            "Faculty of Applied Economics, University of Antwerp, Prinsstraat 13, 2000, Antwerp, Belgium"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Martens", 
        "givenName": "David", 
        "id": "sg:person.07411142156.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07411142156.50"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "New York University", 
          "id": "https://www.grid.ac/institutes/grid.137628.9", 
          "name": [
            "Information, Operations and Management Sciences, Stern School of Business, New York University, New York City, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Provost", 
        "givenName": "Foster", 
        "id": "sg:person.07501646413.35", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07501646413.35"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/sj.npp.1300030", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003219693", 
          "https://doi.org/10.1038/sj.npp.1300030"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/sj.npp.1300030", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003219693", 
          "https://doi.org/10.1038/sj.npp.1300030"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/b:mach.0000039778.69032.ab", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006773373", 
          "https://doi.org/10.1023/b:mach.0000039778.69032.ab"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1037/h0070288", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009587172"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.patrec.2005.10.010", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013701558"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/03610910701790269", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029530422"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1007413511361", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030336415", 
          "https://doi.org/10.1023/a:1007413511361"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.2307/4450505", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037537667"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1060745.1060754", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038299769"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0079-7421(08)60454-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040095765"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1089/big.2013.0037", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043295004"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1089/big.2013.0037", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043295004"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1089/big.2013.0037", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043295004"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.1218772110", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048617383"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1901/jeab.1961.4-267", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049631395"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1901/jeab.1961.4-267", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049631395"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.2307/3212535", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1070226747"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.21236/ad0426243", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091993431"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-06", 
    "datePublishedReg": "2018-06-01", 
    "description": "This paper introduces a new event model appropriate for classifying (binary) data generated by a \u201cdestructive choice\u201d process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s10994-018-5699-z", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1125588", 
        "issn": [
          "0885-6125", 
          "1573-0565"
        ], 
        "name": "Machine Learning", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "6", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "107"
      }
    ], 
    "name": "Wallenius Bayes", 
    "pagination": "1013-1037", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "68a74c3ebd996648421bf1546c32820fea4143e6e715558e99a3dcac070d034e"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s10994-018-5699-z"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1101176965"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s10994-018-5699-z", 
      "https://app.dimensions.ai/details/publication/pub.1101176965"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T09:42", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000346_0000000346/records_99843_00000004.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1007%2Fs10994-018-5699-z"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z'


 

This table displays all metadata directly associated to this object as RDF triples.

126 TRIPLES      21 PREDICATES      41 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s10994-018-5699-z schema:about anzsrc-for:01
2 anzsrc-for:0104
3 schema:author N529b6af817244df8a236a9a47d589a1e
4 schema:citation sg:pub.10.1023/a:1007413511361
5 sg:pub.10.1023/b:mach.0000039778.69032.ab
6 sg:pub.10.1038/sj.npp.1300030
7 https://doi.org/10.1016/j.patrec.2005.10.010
8 https://doi.org/10.1016/s0079-7421(08)60454-5
9 https://doi.org/10.1037/h0070288
10 https://doi.org/10.1073/pnas.1218772110
11 https://doi.org/10.1080/03610910701790269
12 https://doi.org/10.1089/big.2013.0037
13 https://doi.org/10.1145/1060745.1060754
14 https://doi.org/10.1901/jeab.1961.4-267
15 https://doi.org/10.21236/ad0426243
16 https://doi.org/10.2307/3212535
17 https://doi.org/10.2307/4450505
18 schema:datePublished 2018-06
19 schema:datePublishedReg 2018-06-01
20 schema:description This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.
21 schema:genre research_article
22 schema:inLanguage en
23 schema:isAccessibleForFree false
24 schema:isPartOf N0d3c50a4a43c433c8cd9a16198d12682
25 Nf2539074ad83447d8df92119e50d7d78
26 sg:journal.1125588
27 schema:name Wallenius Bayes
28 schema:pagination 1013-1037
29 schema:productId N168d6078610f430e91d4ecfb89c096c7
30 Na869e719e04d43c7a9ea34e8ee6baf47
31 Nb9a73b6c3c4e4487b596a688d7672a39
32 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101176965
33 https://doi.org/10.1007/s10994-018-5699-z
34 schema:sdDatePublished 2019-04-11T09:42
35 schema:sdLicense https://scigraph.springernature.com/explorer/license/
36 schema:sdPublisher N4918eedc14ee4dec9a9c3290d11b36b6
37 schema:url https://link.springer.com/10.1007%2Fs10994-018-5699-z
38 sgo:license sg:explorer/license/
39 sgo:sdDataset articles
40 rdf:type schema:ScholarlyArticle
41 N0d3c50a4a43c433c8cd9a16198d12682 schema:issueNumber 6
42 rdf:type schema:PublicationIssue
43 N168d6078610f430e91d4ecfb89c096c7 schema:name doi
44 schema:value 10.1007/s10994-018-5699-z
45 rdf:type schema:PropertyValue
46 N28b2fb37908e4f8e8e721b30333a3eb9 rdf:first sg:person.07501646413.35
47 rdf:rest rdf:nil
48 N4918eedc14ee4dec9a9c3290d11b36b6 schema:name Springer Nature - SN SciGraph project
49 rdf:type schema:Organization
50 N529b6af817244df8a236a9a47d589a1e rdf:first sg:person.012600337251.48
51 rdf:rest N6f564f97f8574be1bd3f22fdbb37b525
52 N6f564f97f8574be1bd3f22fdbb37b525 rdf:first sg:person.07411142156.50
53 rdf:rest N28b2fb37908e4f8e8e721b30333a3eb9
54 Na869e719e04d43c7a9ea34e8ee6baf47 schema:name readcube_id
55 schema:value 68a74c3ebd996648421bf1546c32820fea4143e6e715558e99a3dcac070d034e
56 rdf:type schema:PropertyValue
57 Nb9a73b6c3c4e4487b596a688d7672a39 schema:name dimensions_id
58 schema:value pub.1101176965
59 rdf:type schema:PropertyValue
60 Nf2539074ad83447d8df92119e50d7d78 schema:volumeNumber 107
61 rdf:type schema:PublicationVolume
62 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
63 schema:name Mathematical Sciences
64 rdf:type schema:DefinedTerm
65 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
66 schema:name Statistics
67 rdf:type schema:DefinedTerm
68 sg:journal.1125588 schema:issn 0885-6125
69 1573-0565
70 schema:name Machine Learning
71 rdf:type schema:Periodical
72 sg:person.012600337251.48 schema:affiliation https://www.grid.ac/institutes/grid.449457.f
73 schema:familyName Junqué de Fortuny
74 schema:givenName Enric
75 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012600337251.48
76 rdf:type schema:Person
77 sg:person.07411142156.50 schema:affiliation https://www.grid.ac/institutes/grid.5284.b
78 schema:familyName Martens
79 schema:givenName David
80 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07411142156.50
81 rdf:type schema:Person
82 sg:person.07501646413.35 schema:affiliation https://www.grid.ac/institutes/grid.137628.9
83 schema:familyName Provost
84 schema:givenName Foster
85 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07501646413.35
86 rdf:type schema:Person
87 sg:pub.10.1023/a:1007413511361 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030336415
88 https://doi.org/10.1023/a:1007413511361
89 rdf:type schema:CreativeWork
90 sg:pub.10.1023/b:mach.0000039778.69032.ab schema:sameAs https://app.dimensions.ai/details/publication/pub.1006773373
91 https://doi.org/10.1023/b:mach.0000039778.69032.ab
92 rdf:type schema:CreativeWork
93 sg:pub.10.1038/sj.npp.1300030 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003219693
94 https://doi.org/10.1038/sj.npp.1300030
95 rdf:type schema:CreativeWork
96 https://doi.org/10.1016/j.patrec.2005.10.010 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013701558
97 rdf:type schema:CreativeWork
98 https://doi.org/10.1016/s0079-7421(08)60454-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040095765
99 rdf:type schema:CreativeWork
100 https://doi.org/10.1037/h0070288 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009587172
101 rdf:type schema:CreativeWork
102 https://doi.org/10.1073/pnas.1218772110 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048617383
103 rdf:type schema:CreativeWork
104 https://doi.org/10.1080/03610910701790269 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029530422
105 rdf:type schema:CreativeWork
106 https://doi.org/10.1089/big.2013.0037 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043295004
107 rdf:type schema:CreativeWork
108 https://doi.org/10.1145/1060745.1060754 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038299769
109 rdf:type schema:CreativeWork
110 https://doi.org/10.1901/jeab.1961.4-267 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049631395
111 rdf:type schema:CreativeWork
112 https://doi.org/10.21236/ad0426243 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091993431
113 rdf:type schema:CreativeWork
114 https://doi.org/10.2307/3212535 schema:sameAs https://app.dimensions.ai/details/publication/pub.1070226747
115 rdf:type schema:CreativeWork
116 https://doi.org/10.2307/4450505 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037537667
117 rdf:type schema:CreativeWork
118 https://www.grid.ac/institutes/grid.137628.9 schema:alternateName New York University
119 schema:name Information, Operations and Management Sciences, Stern School of Business, New York University, New York City, USA
120 rdf:type schema:Organization
121 https://www.grid.ac/institutes/grid.449457.f schema:alternateName New York University Shanghai
122 schema:name NYU Shanghai, 1555 Century Ave, Shanghai, China
123 rdf:type schema:Organization
124 https://www.grid.ac/institutes/grid.5284.b schema:alternateName University of Antwerp
125 schema:name Faculty of Applied Economics, University of Antwerp, Prinsstraat 13, 2000, Antwerp, Belgium
126 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...