A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2009

AUTHORS

Ke Zhang , Marcus Hutter , Huidong Jin

ABSTRACT

Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection methods are ineffective on scattered real-world datasets due to implicit data patterns and parameter setting issues. We define a novel Local Distance-based Outlier Factor (LDOF) to measure the outlier-ness of objects in scattered datasets which addresses these issues. LDOF uses the relative location of an object to its neighbours to determine the degree to which the object deviates from its neighbourhood. We present theoretical bounds on LDOF’s false-detection probability. Experimentally, LDOF compares favorably to classical KNN and LOF based outlier detection. In particular it is less sensitive to parameter values. More... »

PAGES

813-822

References to SciGraph publications

  • 2006. A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data in ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING
  • 1999. OPTICS-OF: Identifying Local Outliers in PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY
  • 2002. Enhancing Effectiveness of Outlier Detections for Low Density Patterns in ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING
  • 1980. Identification of Outliers in NONE
  • Book

    TITLE

    Advances in Knowledge Discovery and Data Mining

    ISBN

    978-3-642-01306-5
    978-3-642-01307-2

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-642-01307-2_84

    DOI

    http://dx.doi.org/10.1007/978-3-642-01307-2_84

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1003376438


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information Systems", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Australian National University", 
              "id": "https://www.grid.ac/institutes/grid.1001.0", 
              "name": [
                "RSISE, Australian National University, Australia"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Zhang", 
            "givenName": "Ke", 
            "id": "sg:person.016435644701.29", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016435644701.29"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Data61", 
              "id": "https://www.grid.ac/institutes/grid.425461.0", 
              "name": [
                "RSISE, Australian National University, Australia", 
                "National ICT Australia (NICTA), Canberra Lab, ACT, Australia"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hutter", 
            "givenName": "Marcus", 
            "id": "sg:person.011366707151.61", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011366707151.61"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Data61", 
              "id": "https://www.grid.ac/institutes/grid.425461.0", 
              "name": [
                "RSISE, Australian National University, Australia", 
                "National ICT Australia (NICTA), Canberra Lab, ACT, Australia", 
                "CSIRO Mathematical and Information Sciences, Acton ACT 2601, Australia"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Jin", 
            "givenName": "Huidong", 
            "id": "sg:person.015521005547.47", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015521005547.47"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1145/1401890.1401946", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008072501"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/342009.335437", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023478359"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/342009.335388", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029221191"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11731139_66", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032212702", 
              "https://doi.org/10.1007/11731139_66"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11731139_66", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032212702", 
              "https://doi.org/10.1007/11731139_66"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-48247-5_28", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042737157", 
              "https://doi.org/10.1007/978-3-540-48247-5_28"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-48247-5_28", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042737157", 
              "https://doi.org/10.1007/978-3-540-48247-5_28"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-47887-6_53", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047183551", 
              "https://doi.org/10.1007/3-540-47887-6_53"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-94-015-3994-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1053674345", 
              "https://doi.org/10.1007/978-94-015-3994-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-94-015-3994-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1053674345", 
              "https://doi.org/10.1007/978-94-015-3994-4"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2009", 
        "datePublishedReg": "2009-01-01", 
        "description": "Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection methods are ineffective on scattered real-world datasets due to implicit data patterns and parameter setting issues. We define a novel Local Distance-based Outlier Factor (LDOF) to measure the outlier-ness of objects in scattered datasets which addresses these issues. LDOF uses the relative location of an object to its neighbours to determine the degree to which the object deviates from its neighbourhood. We present theoretical bounds on LDOF\u2019s false-detection probability. Experimentally, LDOF compares favorably to classical KNN and LOF based outlier detection. In particular it is less sensitive to parameter values.", 
        "editor": [
          {
            "familyName": "Theeramunkong", 
            "givenName": "Thanaruk", 
            "type": "Person"
          }, 
          {
            "familyName": "Kijsirikul", 
            "givenName": "Boonserm", 
            "type": "Person"
          }, 
          {
            "familyName": "Cercone", 
            "givenName": "Nick", 
            "type": "Person"
          }, 
          {
            "familyName": "Ho", 
            "givenName": "Tu-Bao", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-642-01307-2_84", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-642-01306-5", 
            "978-3-642-01307-2"
          ], 
          "name": "Advances in Knowledge Discovery and Data Mining", 
          "type": "Book"
        }, 
        "name": "A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data", 
        "pagination": "813-822", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1003376438"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-642-01307-2_84"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "10154e1c1305d351ab611c0fd13502f4633f3e37c117900c688708de6c4b4f35"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-642-01307-2_84", 
          "https://app.dimensions.ai/details/publication/pub.1003376438"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T07:09", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000353_0000000353/records_45336_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F978-3-642-01307-2_84"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-01307-2_84'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-01307-2_84'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-01307-2_84'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-01307-2_84'


     

    This table displays all metadata directly associated to this object as RDF triples.

    124 TRIPLES      23 PREDICATES      34 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-642-01307-2_84 schema:about anzsrc-for:08
    2 anzsrc-for:0806
    3 schema:author Nb4ea61229e5747398c9c33a1c16f085d
    4 schema:citation sg:pub.10.1007/11731139_66
    5 sg:pub.10.1007/3-540-47887-6_53
    6 sg:pub.10.1007/978-3-540-48247-5_28
    7 sg:pub.10.1007/978-94-015-3994-4
    8 https://doi.org/10.1145/1401890.1401946
    9 https://doi.org/10.1145/342009.335388
    10 https://doi.org/10.1145/342009.335437
    11 schema:datePublished 2009
    12 schema:datePublishedReg 2009-01-01
    13 schema:description Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection methods are ineffective on scattered real-world datasets due to implicit data patterns and parameter setting issues. We define a novel Local Distance-based Outlier Factor (LDOF) to measure the outlier-ness of objects in scattered datasets which addresses these issues. LDOF uses the relative location of an object to its neighbours to determine the degree to which the object deviates from its neighbourhood. We present theoretical bounds on LDOF’s false-detection probability. Experimentally, LDOF compares favorably to classical KNN and LOF based outlier detection. In particular it is less sensitive to parameter values.
    14 schema:editor Ndab3e3590dd14fc98cf5041049aefb8a
    15 schema:genre chapter
    16 schema:inLanguage en
    17 schema:isAccessibleForFree true
    18 schema:isPartOf N58dbee342c8449688234fe3310742189
    19 schema:name A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data
    20 schema:pagination 813-822
    21 schema:productId N60dde323e5ed475b8b3ad1a641bd8a97
    22 N924c70966b1d4375941b68999a1a71f3
    23 Nc5498239ab9843eea7715d78b726f4f7
    24 schema:publisher N9b55ad28cd034b9cb86a43539e08031b
    25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003376438
    26 https://doi.org/10.1007/978-3-642-01307-2_84
    27 schema:sdDatePublished 2019-04-16T07:09
    28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    29 schema:sdPublisher Nfaf7eec1781f418c85cfd57e481d964f
    30 schema:url https://link.springer.com/10.1007%2F978-3-642-01307-2_84
    31 sgo:license sg:explorer/license/
    32 sgo:sdDataset chapters
    33 rdf:type schema:Chapter
    34 N062efb7bb525420f84416961eac9c1ec schema:familyName Kijsirikul
    35 schema:givenName Boonserm
    36 rdf:type schema:Person
    37 N13a48508a995467ead3c197bce2e5ccd rdf:first sg:person.011366707151.61
    38 rdf:rest N9d585a5d4c8845c9bfa5c9ddd8225768
    39 N3cf4b94d1ed04397be73923dd0f8117d schema:familyName Cercone
    40 schema:givenName Nick
    41 rdf:type schema:Person
    42 N58dbee342c8449688234fe3310742189 schema:isbn 978-3-642-01306-5
    43 978-3-642-01307-2
    44 schema:name Advances in Knowledge Discovery and Data Mining
    45 rdf:type schema:Book
    46 N60dde323e5ed475b8b3ad1a641bd8a97 schema:name readcube_id
    47 schema:value 10154e1c1305d351ab611c0fd13502f4633f3e37c117900c688708de6c4b4f35
    48 rdf:type schema:PropertyValue
    49 N74a7f071b9db45e7a106e9affb84d133 rdf:first N3cf4b94d1ed04397be73923dd0f8117d
    50 rdf:rest Nbafda9ba5d2a405f9341efacebd93481
    51 N924c70966b1d4375941b68999a1a71f3 schema:name doi
    52 schema:value 10.1007/978-3-642-01307-2_84
    53 rdf:type schema:PropertyValue
    54 N9b55ad28cd034b9cb86a43539e08031b schema:location Berlin, Heidelberg
    55 schema:name Springer Berlin Heidelberg
    56 rdf:type schema:Organisation
    57 N9d585a5d4c8845c9bfa5c9ddd8225768 rdf:first sg:person.015521005547.47
    58 rdf:rest rdf:nil
    59 Naaac1e8169af4d81b9339500ce867f2a rdf:first N062efb7bb525420f84416961eac9c1ec
    60 rdf:rest N74a7f071b9db45e7a106e9affb84d133
    61 Nae2132286fbc4ede802b4605551d63ef schema:familyName Ho
    62 schema:givenName Tu-Bao
    63 rdf:type schema:Person
    64 Nb1b1cf9217ab4a3d97a87b327942be1d schema:familyName Theeramunkong
    65 schema:givenName Thanaruk
    66 rdf:type schema:Person
    67 Nb4ea61229e5747398c9c33a1c16f085d rdf:first sg:person.016435644701.29
    68 rdf:rest N13a48508a995467ead3c197bce2e5ccd
    69 Nbafda9ba5d2a405f9341efacebd93481 rdf:first Nae2132286fbc4ede802b4605551d63ef
    70 rdf:rest rdf:nil
    71 Nc5498239ab9843eea7715d78b726f4f7 schema:name dimensions_id
    72 schema:value pub.1003376438
    73 rdf:type schema:PropertyValue
    74 Ndab3e3590dd14fc98cf5041049aefb8a rdf:first Nb1b1cf9217ab4a3d97a87b327942be1d
    75 rdf:rest Naaac1e8169af4d81b9339500ce867f2a
    76 Nfaf7eec1781f418c85cfd57e481d964f schema:name Springer Nature - SN SciGraph project
    77 rdf:type schema:Organization
    78 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    79 schema:name Information and Computing Sciences
    80 rdf:type schema:DefinedTerm
    81 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
    82 schema:name Information Systems
    83 rdf:type schema:DefinedTerm
    84 sg:person.011366707151.61 schema:affiliation https://www.grid.ac/institutes/grid.425461.0
    85 schema:familyName Hutter
    86 schema:givenName Marcus
    87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011366707151.61
    88 rdf:type schema:Person
    89 sg:person.015521005547.47 schema:affiliation https://www.grid.ac/institutes/grid.425461.0
    90 schema:familyName Jin
    91 schema:givenName Huidong
    92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015521005547.47
    93 rdf:type schema:Person
    94 sg:person.016435644701.29 schema:affiliation https://www.grid.ac/institutes/grid.1001.0
    95 schema:familyName Zhang
    96 schema:givenName Ke
    97 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016435644701.29
    98 rdf:type schema:Person
    99 sg:pub.10.1007/11731139_66 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032212702
    100 https://doi.org/10.1007/11731139_66
    101 rdf:type schema:CreativeWork
    102 sg:pub.10.1007/3-540-47887-6_53 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047183551
    103 https://doi.org/10.1007/3-540-47887-6_53
    104 rdf:type schema:CreativeWork
    105 sg:pub.10.1007/978-3-540-48247-5_28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042737157
    106 https://doi.org/10.1007/978-3-540-48247-5_28
    107 rdf:type schema:CreativeWork
    108 sg:pub.10.1007/978-94-015-3994-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053674345
    109 https://doi.org/10.1007/978-94-015-3994-4
    110 rdf:type schema:CreativeWork
    111 https://doi.org/10.1145/1401890.1401946 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008072501
    112 rdf:type schema:CreativeWork
    113 https://doi.org/10.1145/342009.335388 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029221191
    114 rdf:type schema:CreativeWork
    115 https://doi.org/10.1145/342009.335437 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023478359
    116 rdf:type schema:CreativeWork
    117 https://www.grid.ac/institutes/grid.1001.0 schema:alternateName Australian National University
    118 schema:name RSISE, Australian National University, Australia
    119 rdf:type schema:Organization
    120 https://www.grid.ac/institutes/grid.425461.0 schema:alternateName Data61
    121 schema:name CSIRO Mathematical and Information Sciences, Acton ACT 2601, Australia
    122 National ICT Australia (NICTA), Canberra Lab, ACT, Australia
    123 RSISE, Australian National University, Australia
    124 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...