Binormal Precision–Recall Curves for Optimal Classification of Imbalanced Data View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2019-02-11

AUTHORS

Zhongkai Liu, Howard D. Bondell

ABSTRACT

Binary classification on imbalanced data, i.e., a large skew in the class distribution, is a challenging problem. Evaluation of classifiers via the receiver operating characteristic (ROC) curve is common in binary classification. Techniques to develop classifiers that optimize the area under the ROC curve have been proposed. However, for imbalanced data, the ROC curve tends to give an overly optimistic view. Realizing its disadvantages of dealing with imbalanced data, we propose an approach based on the Precision–Recall (PR) curve under the binormal assumption. We propose to choose the classifier that maximizes the area under the binormal PR curve. The asymptotic distribution of the resulting estimator is shown. Simulations, as well as real data results, indicate that the binormal Precision–Recall method outperforms approaches based on the area under the ROC curve. More... »

PAGES

1-21

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s12561-019-09231-9

DOI

http://dx.doi.org/10.1007/s12561-019-09231-9

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1112075039


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0804", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Data Format", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "North Carolina State University", 
          "id": "https://www.grid.ac/institutes/grid.40803.3f", 
          "name": [
            "North Carolina State University, Raleigh, NC, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Liu", 
        "givenName": "Zhongkai", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Melbourne", 
          "id": "https://www.grid.ac/institutes/grid.1008.9", 
          "name": [
            "University of Melbourne, Melbourne, Australia"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bondell", 
        "givenName": "Howard D.", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1111/j.1541-0420.2005.00420.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007481865"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0022-2496(80)90020-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012196787"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-40994-3_29", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017688596", 
          "https://doi.org/10.1007/978-3-642-40994-3_29"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/jmps.1998.1218", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022192053"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11538059_91", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022715535", 
          "https://doi.org/10.1007/11538059_91"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11538059_91", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022715535", 
          "https://doi.org/10.1007/11538059_91"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11538059_91", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022715535", 
          "https://doi.org/10.1007/11538059_91"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00994018", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025150743", 
          "https://doi.org/10.1007/bf00994018"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0022-2496(75)90001-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025542858"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf02289677", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027341428", 
          "https://doi.org/10.1007/bf02289677"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf02289677", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027341428", 
          "https://doi.org/10.1007/bf02289677"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/15-ejs1035", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031429676"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/65943.65945", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040111937"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/02664760050076443", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040326068"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1102351.1102407", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041853690"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-7-253", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045857088", 
          "https://doi.org/10.1186/1471-2105-7-253"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1143844.1143874", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046546824"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1553374.1553398", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050222806"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/bti724", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051347224"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1148/radiology.143.1.7063747", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1082130998"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/iccse.2014.6926515", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094325826"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icpr.2010.1036", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095245646"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icpr.2010.764", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095428650"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1613/jair.953", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1105579550"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3233/ida-2002-6504", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1107703598"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1109421350", 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1201/9781439800225", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1109421350"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.2517-6161.1964.tb00553.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1110457451"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.2517-6161.1964.tb00553.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1110457451"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2019-02-11", 
    "datePublishedReg": "2019-02-11", 
    "description": "Binary classification on imbalanced data, i.e., a large skew in the class distribution, is a challenging problem. Evaluation of classifiers via the receiver operating characteristic (ROC) curve is common in binary classification. Techniques to develop classifiers that optimize the area under the ROC curve have been proposed. However, for imbalanced data, the ROC curve tends to give an overly optimistic view. Realizing its disadvantages of dealing with imbalanced data, we propose an approach based on the Precision\u2013Recall (PR) curve under the binormal assumption. We propose to choose the classifier that maximizes the area under the binormal PR curve. The asymptotic distribution of the resulting estimator is shown. Simulations, as well as real data results, indicate that the binormal Precision\u2013Recall method outperforms approaches based on the area under the ROC curve.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s12561-019-09231-9", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2435863", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.3484855", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1041137", 
        "issn": [
          "1867-1764", 
          "1867-1772"
        ], 
        "name": "Statistics in Biosciences", 
        "type": "Periodical"
      }
    ], 
    "name": "Binormal Precision\u2013Recall Curves for Optimal Classification of Imbalanced Data", 
    "pagination": "1-21", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "9e92b1a09b567707172cecb9c7140f2ca795dd52d9baa70c42ebe2f17e238930"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s12561-019-09231-9"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1112075039"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s12561-019-09231-9", 
      "https://app.dimensions.ai/details/publication/pub.1112075039"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T09:05", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000335_0000000335/records_125271_00000000.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1007%2Fs12561-019-09231-9"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s12561-019-09231-9'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s12561-019-09231-9'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s12561-019-09231-9'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s12561-019-09231-9'


 

This table displays all metadata directly associated to this object as RDF triples.

146 TRIPLES      21 PREDICATES      49 URIs      16 LITERALS      5 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s12561-019-09231-9 schema:about anzsrc-for:08
2 anzsrc-for:0804
3 schema:author N862ab6539fa44639bb75776738207c68
4 schema:citation sg:pub.10.1007/11538059_91
5 sg:pub.10.1007/978-3-642-40994-3_29
6 sg:pub.10.1007/bf00994018
7 sg:pub.10.1007/bf02289677
8 sg:pub.10.1186/1471-2105-7-253
9 https://app.dimensions.ai/details/publication/pub.1109421350
10 https://doi.org/10.1006/jmps.1998.1218
11 https://doi.org/10.1016/0022-2496(75)90001-2
12 https://doi.org/10.1016/0022-2496(80)90020-6
13 https://doi.org/10.1080/02664760050076443
14 https://doi.org/10.1093/bioinformatics/bti724
15 https://doi.org/10.1109/iccse.2014.6926515
16 https://doi.org/10.1109/icpr.2010.1036
17 https://doi.org/10.1109/icpr.2010.764
18 https://doi.org/10.1111/j.1541-0420.2005.00420.x
19 https://doi.org/10.1111/j.2517-6161.1964.tb00553.x
20 https://doi.org/10.1145/1102351.1102407
21 https://doi.org/10.1145/1143844.1143874
22 https://doi.org/10.1145/1553374.1553398
23 https://doi.org/10.1145/65943.65945
24 https://doi.org/10.1148/radiology.143.1.7063747
25 https://doi.org/10.1201/9781439800225
26 https://doi.org/10.1214/15-ejs1035
27 https://doi.org/10.1613/jair.953
28 https://doi.org/10.3233/ida-2002-6504
29 schema:datePublished 2019-02-11
30 schema:datePublishedReg 2019-02-11
31 schema:description Binary classification on imbalanced data, i.e., a large skew in the class distribution, is a challenging problem. Evaluation of classifiers via the receiver operating characteristic (ROC) curve is common in binary classification. Techniques to develop classifiers that optimize the area under the ROC curve have been proposed. However, for imbalanced data, the ROC curve tends to give an overly optimistic view. Realizing its disadvantages of dealing with imbalanced data, we propose an approach based on the Precision–Recall (PR) curve under the binormal assumption. We propose to choose the classifier that maximizes the area under the binormal PR curve. The asymptotic distribution of the resulting estimator is shown. Simulations, as well as real data results, indicate that the binormal Precision–Recall method outperforms approaches based on the area under the ROC curve.
32 schema:genre research_article
33 schema:inLanguage en
34 schema:isAccessibleForFree false
35 schema:isPartOf sg:journal.1041137
36 schema:name Binormal Precision–Recall Curves for Optimal Classification of Imbalanced Data
37 schema:pagination 1-21
38 schema:productId N839ca033d51940839e0bec430f18af74
39 Nb81623e1ca73400e9ab4de663421a88a
40 Ne12ad3e0920b4a31908ca040e8111a4f
41 schema:sameAs https://app.dimensions.ai/details/publication/pub.1112075039
42 https://doi.org/10.1007/s12561-019-09231-9
43 schema:sdDatePublished 2019-04-11T09:05
44 schema:sdLicense https://scigraph.springernature.com/explorer/license/
45 schema:sdPublisher N15815bfbafeb495caa8c570f906d421c
46 schema:url https://link.springer.com/10.1007%2Fs12561-019-09231-9
47 sgo:license sg:explorer/license/
48 sgo:sdDataset articles
49 rdf:type schema:ScholarlyArticle
50 N15815bfbafeb495caa8c570f906d421c schema:name Springer Nature - SN SciGraph project
51 rdf:type schema:Organization
52 N63589176d2f84d3489372b783dad6553 rdf:first Nc38f8b785daa42adb41022aac569798f
53 rdf:rest rdf:nil
54 N839ca033d51940839e0bec430f18af74 schema:name readcube_id
55 schema:value 9e92b1a09b567707172cecb9c7140f2ca795dd52d9baa70c42ebe2f17e238930
56 rdf:type schema:PropertyValue
57 N862ab6539fa44639bb75776738207c68 rdf:first Na81f664224774aa28745abfa788b22b4
58 rdf:rest N63589176d2f84d3489372b783dad6553
59 Na81f664224774aa28745abfa788b22b4 schema:affiliation https://www.grid.ac/institutes/grid.40803.3f
60 schema:familyName Liu
61 schema:givenName Zhongkai
62 rdf:type schema:Person
63 Nb81623e1ca73400e9ab4de663421a88a schema:name doi
64 schema:value 10.1007/s12561-019-09231-9
65 rdf:type schema:PropertyValue
66 Nc38f8b785daa42adb41022aac569798f schema:affiliation https://www.grid.ac/institutes/grid.1008.9
67 schema:familyName Bondell
68 schema:givenName Howard D.
69 rdf:type schema:Person
70 Ne12ad3e0920b4a31908ca040e8111a4f schema:name dimensions_id
71 schema:value pub.1112075039
72 rdf:type schema:PropertyValue
73 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
74 schema:name Information and Computing Sciences
75 rdf:type schema:DefinedTerm
76 anzsrc-for:0804 schema:inDefinedTermSet anzsrc-for:
77 schema:name Data Format
78 rdf:type schema:DefinedTerm
79 sg:grant.2435863 http://pending.schema.org/fundedItem sg:pub.10.1007/s12561-019-09231-9
80 rdf:type schema:MonetaryGrant
81 sg:grant.3484855 http://pending.schema.org/fundedItem sg:pub.10.1007/s12561-019-09231-9
82 rdf:type schema:MonetaryGrant
83 sg:journal.1041137 schema:issn 1867-1764
84 1867-1772
85 schema:name Statistics in Biosciences
86 rdf:type schema:Periodical
87 sg:pub.10.1007/11538059_91 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022715535
88 https://doi.org/10.1007/11538059_91
89 rdf:type schema:CreativeWork
90 sg:pub.10.1007/978-3-642-40994-3_29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017688596
91 https://doi.org/10.1007/978-3-642-40994-3_29
92 rdf:type schema:CreativeWork
93 sg:pub.10.1007/bf00994018 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025150743
94 https://doi.org/10.1007/bf00994018
95 rdf:type schema:CreativeWork
96 sg:pub.10.1007/bf02289677 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027341428
97 https://doi.org/10.1007/bf02289677
98 rdf:type schema:CreativeWork
99 sg:pub.10.1186/1471-2105-7-253 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045857088
100 https://doi.org/10.1186/1471-2105-7-253
101 rdf:type schema:CreativeWork
102 https://app.dimensions.ai/details/publication/pub.1109421350 schema:CreativeWork
103 https://doi.org/10.1006/jmps.1998.1218 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022192053
104 rdf:type schema:CreativeWork
105 https://doi.org/10.1016/0022-2496(75)90001-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025542858
106 rdf:type schema:CreativeWork
107 https://doi.org/10.1016/0022-2496(80)90020-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012196787
108 rdf:type schema:CreativeWork
109 https://doi.org/10.1080/02664760050076443 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040326068
110 rdf:type schema:CreativeWork
111 https://doi.org/10.1093/bioinformatics/bti724 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051347224
112 rdf:type schema:CreativeWork
113 https://doi.org/10.1109/iccse.2014.6926515 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094325826
114 rdf:type schema:CreativeWork
115 https://doi.org/10.1109/icpr.2010.1036 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095245646
116 rdf:type schema:CreativeWork
117 https://doi.org/10.1109/icpr.2010.764 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095428650
118 rdf:type schema:CreativeWork
119 https://doi.org/10.1111/j.1541-0420.2005.00420.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1007481865
120 rdf:type schema:CreativeWork
121 https://doi.org/10.1111/j.2517-6161.1964.tb00553.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1110457451
122 rdf:type schema:CreativeWork
123 https://doi.org/10.1145/1102351.1102407 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041853690
124 rdf:type schema:CreativeWork
125 https://doi.org/10.1145/1143844.1143874 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046546824
126 rdf:type schema:CreativeWork
127 https://doi.org/10.1145/1553374.1553398 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050222806
128 rdf:type schema:CreativeWork
129 https://doi.org/10.1145/65943.65945 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040111937
130 rdf:type schema:CreativeWork
131 https://doi.org/10.1148/radiology.143.1.7063747 schema:sameAs https://app.dimensions.ai/details/publication/pub.1082130998
132 rdf:type schema:CreativeWork
133 https://doi.org/10.1201/9781439800225 schema:sameAs https://app.dimensions.ai/details/publication/pub.1109421350
134 rdf:type schema:CreativeWork
135 https://doi.org/10.1214/15-ejs1035 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031429676
136 rdf:type schema:CreativeWork
137 https://doi.org/10.1613/jair.953 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105579550
138 rdf:type schema:CreativeWork
139 https://doi.org/10.3233/ida-2002-6504 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107703598
140 rdf:type schema:CreativeWork
141 https://www.grid.ac/institutes/grid.1008.9 schema:alternateName University of Melbourne
142 schema:name University of Melbourne, Melbourne, Australia
143 rdf:type schema:Organization
144 https://www.grid.ac/institutes/grid.40803.3f schema:alternateName North Carolina State University
145 schema:name North Carolina State University, Raleigh, NC, USA
146 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...