2018-06
AUTHORSEnric Junqué de Fortuny, David Martens, Foster Provost
ABSTRACTThis paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice. More... »
PAGES1013-1037
http://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z
DOIhttp://dx.doi.org/10.1007/s10994-018-5699-z
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1101176965
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Statistics",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Mathematical Sciences",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "New York University Shanghai",
"id": "https://www.grid.ac/institutes/grid.449457.f",
"name": [
"NYU Shanghai, 1555 Century Ave, Shanghai, China"
],
"type": "Organization"
},
"familyName": "Junqu\u00e9 de Fortuny",
"givenName": "Enric",
"id": "sg:person.012600337251.48",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012600337251.48"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "University of Antwerp",
"id": "https://www.grid.ac/institutes/grid.5284.b",
"name": [
"Faculty of Applied Economics, University of Antwerp, Prinsstraat 13, 2000, Antwerp, Belgium"
],
"type": "Organization"
},
"familyName": "Martens",
"givenName": "David",
"id": "sg:person.07411142156.50",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07411142156.50"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "New York University",
"id": "https://www.grid.ac/institutes/grid.137628.9",
"name": [
"Information, Operations and Management Sciences, Stern School of Business, New York University, New York City, USA"
],
"type": "Organization"
},
"familyName": "Provost",
"givenName": "Foster",
"id": "sg:person.07501646413.35",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07501646413.35"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1038/sj.npp.1300030",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1003219693",
"https://doi.org/10.1038/sj.npp.1300030"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1038/sj.npp.1300030",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1003219693",
"https://doi.org/10.1038/sj.npp.1300030"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1023/b:mach.0000039778.69032.ab",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1006773373",
"https://doi.org/10.1023/b:mach.0000039778.69032.ab"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1037/h0070288",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1009587172"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.patrec.2005.10.010",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1013701558"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1080/03610910701790269",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1029530422"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1023/a:1007413511361",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1030336415",
"https://doi.org/10.1023/a:1007413511361"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.2307/4450505",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1037537667"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/1060745.1060754",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1038299769"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/s0079-7421(08)60454-5",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1040095765"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1089/big.2013.0037",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1043295004"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1089/big.2013.0037",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1043295004"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1089/big.2013.0037",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1043295004"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1073/pnas.1218772110",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1048617383"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1901/jeab.1961.4-267",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1049631395"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1901/jeab.1961.4-267",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1049631395"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.2307/3212535",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1070226747"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.21236/ad0426243",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1091993431"
],
"type": "CreativeWork"
}
],
"datePublished": "2018-06",
"datePublishedReg": "2018-06-01",
"description": "This paper introduces a new event model appropriate for classifying (binary) data generated by a \u201cdestructive choice\u201d process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.",
"genre": "research_article",
"id": "sg:pub.10.1007/s10994-018-5699-z",
"inLanguage": [
"en"
],
"isAccessibleForFree": false,
"isPartOf": [
{
"id": "sg:journal.1125588",
"issn": [
"0885-6125",
"1573-0565"
],
"name": "Machine Learning",
"type": "Periodical"
},
{
"issueNumber": "6",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "107"
}
],
"name": "Wallenius Bayes",
"pagination": "1013-1037",
"productId": [
{
"name": "readcube_id",
"type": "PropertyValue",
"value": [
"68a74c3ebd996648421bf1546c32820fea4143e6e715558e99a3dcac070d034e"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/s10994-018-5699-z"
]
},
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1101176965"
]
}
],
"sameAs": [
"https://doi.org/10.1007/s10994-018-5699-z",
"https://app.dimensions.ai/details/publication/pub.1101176965"
],
"sdDataset": "articles",
"sdDatePublished": "2019-04-11T09:42",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000346_0000000346/records_99843_00000004.jsonl",
"type": "ScholarlyArticle",
"url": "https://link.springer.com/10.1007%2Fs10994-018-5699-z"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10994-018-5699-z'
This table displays all metadata directly associated to this object as RDF triples.
126 TRIPLES
21 PREDICATES
41 URIs
19 LITERALS
7 BLANK NODES