Levelwise Search and Borders of Theories in Knowledge Discovery View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

1997-09

AUTHORS

Heikki Mannila, Hannu Toivonen

ABSTRACT

One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S L determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD. More... »

PAGES

241-258

References to SciGraph publications

  • 1995-01. Efficient discovery of interesting statements in databases in JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
  • 1997-09. Discovery of Frequent Episodes in Event Sequences in DATA MINING AND KNOWLEDGE DISCOVERY
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1023/a:1009796218281

    DOI

    http://dx.doi.org/10.1023/a:1009796218281

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1011497512


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information Systems", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Helsinki", 
              "id": "https://www.grid.ac/institutes/grid.7737.4", 
              "name": [
                "Department of Computer Science, University of Helsinki, P.O. Box 26, FIN\u201300014, Finland", 
                "Department of Computer Science, University of Helsinki, P.O. Box 26, FIN\u201300014, Finland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Mannila", 
            "givenName": "Heikki", 
            "id": "sg:person.01065176745.00", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01065176745.00"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Helsinki", 
              "id": "https://www.grid.ac/institutes/grid.7737.4", 
              "name": [
                "Department of Computer Science, University of Helsinki, P.O. Box 26, FIN\u201300014, Finland", 
                "Department of Computer Science, University of Helsinki, P.O. Box 26, FIN\u201300014, Finland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Toivonen", 
            "givenName": "Hannu", 
            "id": "sg:person.014126306365.50", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014126306365.50"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1023/a:1009748302351", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1005237640", 
              "https://doi.org/10.1023/a:1009748302351"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/240455.240472", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006761471"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0169-023x(94)90023-x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009065409"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0169-023x(94)90023-x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009065409"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0166-218x(92)90031-5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1012092840"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf00962822", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017336206", 
              "https://doi.org/10.1007/bf00962822"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf00962822", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017336206", 
              "https://doi.org/10.1007/bf00962822"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/233269.233311", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019413618"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/int.4550070703", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1026054357"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/237661.237708", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027093368"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/170035.170072", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1028726331"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/263661.263684", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031112924"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0022-0000(86)90015-2", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042700576"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/253260.253327", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045333834"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0004-3702(94)90112-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046632075"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0004-3702(94)90112-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046632075"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0004-3702(82)90040-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049130265"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0004-3702(82)90040-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049130265"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1006/jagm.1996.0062", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1051116080"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/191246.191314", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1052019070"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1137/s0097539793250299", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1062879828"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "1997-09", 
        "datePublishedReg": "1997-09-01", 
        "description": "One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S L determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1023/a:1009796218281", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1041853", 
            "issn": [
              "1384-5810", 
              "1573-756X"
            ], 
            "name": "Data Mining and Knowledge Discovery", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "3", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "1"
          }
        ], 
        "name": "Levelwise Search and Borders of Theories in Knowledge Discovery", 
        "pagination": "241-258", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "36c797cc5ba1f4a8928a6b89aba1df33bf1d5ef7455929d4355244e057acc3d0"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1023/a:1009796218281"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1011497512"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1023/a:1009796218281", 
          "https://app.dimensions.ai/details/publication/pub.1011497512"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-10T13:23", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8659_00000536.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "http://link.springer.com/10.1023%2FA%3A1009796218281"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1009796218281'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1009796218281'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1009796218281'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1009796218281'


     

    This table displays all metadata directly associated to this object as RDF triples.

    121 TRIPLES      21 PREDICATES      44 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1023/a:1009796218281 schema:about anzsrc-for:08
    2 anzsrc-for:0806
    3 schema:author Nfa0a01f7b28044f5bc859eba99754ffe
    4 schema:citation sg:pub.10.1007/bf00962822
    5 sg:pub.10.1023/a:1009748302351
    6 https://doi.org/10.1002/int.4550070703
    7 https://doi.org/10.1006/jagm.1996.0062
    8 https://doi.org/10.1016/0004-3702(82)90040-6
    9 https://doi.org/10.1016/0004-3702(94)90112-0
    10 https://doi.org/10.1016/0022-0000(86)90015-2
    11 https://doi.org/10.1016/0166-218x(92)90031-5
    12 https://doi.org/10.1016/0169-023x(94)90023-x
    13 https://doi.org/10.1137/s0097539793250299
    14 https://doi.org/10.1145/170035.170072
    15 https://doi.org/10.1145/191246.191314
    16 https://doi.org/10.1145/233269.233311
    17 https://doi.org/10.1145/237661.237708
    18 https://doi.org/10.1145/240455.240472
    19 https://doi.org/10.1145/253260.253327
    20 https://doi.org/10.1145/263661.263684
    21 schema:datePublished 1997-09
    22 schema:datePublishedReg 1997-09-01
    23 schema:description One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S L determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
    24 schema:genre research_article
    25 schema:inLanguage en
    26 schema:isAccessibleForFree false
    27 schema:isPartOf N150bd089d26b4b21bc1260126ecdeb5b
    28 N7705bfd6d989407382c698d08ed461e4
    29 sg:journal.1041853
    30 schema:name Levelwise Search and Borders of Theories in Knowledge Discovery
    31 schema:pagination 241-258
    32 schema:productId N06abc3fe88ad4c03a09eefe5084da213
    33 N30a36dac576b443dba41e344c5b7d5a2
    34 N9c54cadbbe1f4bd1ad1f96771d9056c6
    35 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011497512
    36 https://doi.org/10.1023/a:1009796218281
    37 schema:sdDatePublished 2019-04-10T13:23
    38 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    39 schema:sdPublisher N27425a3c879b48b6b5f129e88c28ae27
    40 schema:url http://link.springer.com/10.1023%2FA%3A1009796218281
    41 sgo:license sg:explorer/license/
    42 sgo:sdDataset articles
    43 rdf:type schema:ScholarlyArticle
    44 N06abc3fe88ad4c03a09eefe5084da213 schema:name readcube_id
    45 schema:value 36c797cc5ba1f4a8928a6b89aba1df33bf1d5ef7455929d4355244e057acc3d0
    46 rdf:type schema:PropertyValue
    47 N150bd089d26b4b21bc1260126ecdeb5b schema:issueNumber 3
    48 rdf:type schema:PublicationIssue
    49 N27425a3c879b48b6b5f129e88c28ae27 schema:name Springer Nature - SN SciGraph project
    50 rdf:type schema:Organization
    51 N30a36dac576b443dba41e344c5b7d5a2 schema:name dimensions_id
    52 schema:value pub.1011497512
    53 rdf:type schema:PropertyValue
    54 N7705bfd6d989407382c698d08ed461e4 schema:volumeNumber 1
    55 rdf:type schema:PublicationVolume
    56 N87f6c86206a44e34af5033a03c31ec84 rdf:first sg:person.014126306365.50
    57 rdf:rest rdf:nil
    58 N9c54cadbbe1f4bd1ad1f96771d9056c6 schema:name doi
    59 schema:value 10.1023/a:1009796218281
    60 rdf:type schema:PropertyValue
    61 Nfa0a01f7b28044f5bc859eba99754ffe rdf:first sg:person.01065176745.00
    62 rdf:rest N87f6c86206a44e34af5033a03c31ec84
    63 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    64 schema:name Information and Computing Sciences
    65 rdf:type schema:DefinedTerm
    66 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
    67 schema:name Information Systems
    68 rdf:type schema:DefinedTerm
    69 sg:journal.1041853 schema:issn 1384-5810
    70 1573-756X
    71 schema:name Data Mining and Knowledge Discovery
    72 rdf:type schema:Periodical
    73 sg:person.01065176745.00 schema:affiliation https://www.grid.ac/institutes/grid.7737.4
    74 schema:familyName Mannila
    75 schema:givenName Heikki
    76 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01065176745.00
    77 rdf:type schema:Person
    78 sg:person.014126306365.50 schema:affiliation https://www.grid.ac/institutes/grid.7737.4
    79 schema:familyName Toivonen
    80 schema:givenName Hannu
    81 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014126306365.50
    82 rdf:type schema:Person
    83 sg:pub.10.1007/bf00962822 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017336206
    84 https://doi.org/10.1007/bf00962822
    85 rdf:type schema:CreativeWork
    86 sg:pub.10.1023/a:1009748302351 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005237640
    87 https://doi.org/10.1023/a:1009748302351
    88 rdf:type schema:CreativeWork
    89 https://doi.org/10.1002/int.4550070703 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026054357
    90 rdf:type schema:CreativeWork
    91 https://doi.org/10.1006/jagm.1996.0062 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051116080
    92 rdf:type schema:CreativeWork
    93 https://doi.org/10.1016/0004-3702(82)90040-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049130265
    94 rdf:type schema:CreativeWork
    95 https://doi.org/10.1016/0004-3702(94)90112-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046632075
    96 rdf:type schema:CreativeWork
    97 https://doi.org/10.1016/0022-0000(86)90015-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042700576
    98 rdf:type schema:CreativeWork
    99 https://doi.org/10.1016/0166-218x(92)90031-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012092840
    100 rdf:type schema:CreativeWork
    101 https://doi.org/10.1016/0169-023x(94)90023-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1009065409
    102 rdf:type schema:CreativeWork
    103 https://doi.org/10.1137/s0097539793250299 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062879828
    104 rdf:type schema:CreativeWork
    105 https://doi.org/10.1145/170035.170072 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028726331
    106 rdf:type schema:CreativeWork
    107 https://doi.org/10.1145/191246.191314 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052019070
    108 rdf:type schema:CreativeWork
    109 https://doi.org/10.1145/233269.233311 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019413618
    110 rdf:type schema:CreativeWork
    111 https://doi.org/10.1145/237661.237708 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027093368
    112 rdf:type schema:CreativeWork
    113 https://doi.org/10.1145/240455.240472 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006761471
    114 rdf:type schema:CreativeWork
    115 https://doi.org/10.1145/253260.253327 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045333834
    116 rdf:type schema:CreativeWork
    117 https://doi.org/10.1145/263661.263684 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031112924
    118 rdf:type schema:CreativeWork
    119 https://www.grid.ac/institutes/grid.7737.4 schema:alternateName University of Helsinki
    120 schema:name Department of Computer Science, University of Helsinki, P.O. Box 26, FIN–00014, Finland
    121 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...