Pincer-search: A new algorithm for discovering the maximum frequent set View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

1998

AUTHORS

Dao-I Lin , Zvi M. Kedem

ABSTRACT

Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent item-sets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms. More... »

PAGES

103-119

Book

TITLE

Advances in Database Technology — EDBT'98

ISBN

978-3-540-64264-0
978-3-540-69709-1

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/bfb0100980

DOI

http://dx.doi.org/10.1007/bfb0100980

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1023178884


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "New York University", 
          "id": "https://www.grid.ac/institutes/grid.137628.9", 
          "name": [
            "Department of Computer Science Courant Institute of Mathematical Sciences, New York University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lin", 
        "givenName": "Dao-I", 
        "id": "sg:person.010424707337.67", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010424707337.67"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "New York University", 
          "id": "https://www.grid.ac/institutes/grid.137628.9", 
          "name": [
            "Department of Computer Science Courant Institute of Mathematical Sciences, New York University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kedem", 
        "givenName": "Zvi M.", 
        "id": "sg:person.016436451607.36", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016436451607.36"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "1998", 
    "datePublishedReg": "1998-01-01", 
    "description": "Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent item-sets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.", 
    "editor": [
      {
        "familyName": "Schek", 
        "givenName": "Hans-J\u00f6rg", 
        "type": "Person"
      }, 
      {
        "familyName": "Alonso", 
        "givenName": "Gustavo", 
        "type": "Person"
      }, 
      {
        "familyName": "Saltor", 
        "givenName": "Felix", 
        "type": "Person"
      }, 
      {
        "familyName": "Ramos", 
        "givenName": "Isidro", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/bfb0100980", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-64264-0", 
        "978-3-540-69709-1"
      ], 
      "name": "Advances in Database Technology \u2014 EDBT'98", 
      "type": "Book"
    }, 
    "name": "Pincer-search: A new algorithm for discovering the maximum frequent set", 
    "pagination": "103-119", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/bfb0100980"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "6e6e0362b60eeedc7b6b05dc799e7477ef7dc9bf7f8587d6a293231cc08092e5"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1023178884"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/bfb0100980", 
      "https://app.dimensions.ai/details/publication/pub.1023178884"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T20:48", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8690_00000039.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/BFb0100980"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/bfb0100980'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/bfb0100980'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/bfb0100980'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/bfb0100980'


 

This table displays all metadata directly associated to this object as RDF triples.

87 TRIPLES      22 PREDICATES      27 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/bfb0100980 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N35dcb9be5f1244ee8d4c004ddad2866e
4 schema:datePublished 1998
5 schema:datePublishedReg 1998-01-01
6 schema:description Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent item-sets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.
7 schema:editor Nd0c30211cb164506a4f9c3ad2c4be1e2
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf Na3ee0fb88d9344a09ce79e4fa0e22e3b
12 schema:name Pincer-search: A new algorithm for discovering the maximum frequent set
13 schema:pagination 103-119
14 schema:productId N045e2fecbfc941a9af0ca54651faa283
15 Ndebb1330eb9745c5adf23adccb39a7d4
16 Nf32eaba6a9ee4add914987373eee1efd
17 schema:publisher Nd260eadc21a046f9b5900d821799021a
18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023178884
19 https://doi.org/10.1007/bfb0100980
20 schema:sdDatePublished 2019-04-15T20:48
21 schema:sdLicense https://scigraph.springernature.com/explorer/license/
22 schema:sdPublisher Nc0fe3ba028b44ac49853a9f07faf3b6e
23 schema:url http://link.springer.com/10.1007/BFb0100980
24 sgo:license sg:explorer/license/
25 sgo:sdDataset chapters
26 rdf:type schema:Chapter
27 N045e2fecbfc941a9af0ca54651faa283 schema:name doi
28 schema:value 10.1007/bfb0100980
29 rdf:type schema:PropertyValue
30 N35dcb9be5f1244ee8d4c004ddad2866e rdf:first sg:person.010424707337.67
31 rdf:rest Nd235db291765438f9cd8191505f6e1c5
32 N54316ca5df2a4557935636a42d285cad schema:familyName Ramos
33 schema:givenName Isidro
34 rdf:type schema:Person
35 N72bebc4a5b384014a5bcc5d7422a89de rdf:first N763ef9fbef844c55bede5ae40114f72f
36 rdf:rest N766c31bde614414abca10d1501f04049
37 N763ef9fbef844c55bede5ae40114f72f schema:familyName Alonso
38 schema:givenName Gustavo
39 rdf:type schema:Person
40 N766c31bde614414abca10d1501f04049 rdf:first Ncb3d5dd934c94ae190b872920f280bf3
41 rdf:rest Nde7eac0f07c94c68a3d854d52dc31439
42 Na3ee0fb88d9344a09ce79e4fa0e22e3b schema:isbn 978-3-540-64264-0
43 978-3-540-69709-1
44 schema:name Advances in Database Technology — EDBT'98
45 rdf:type schema:Book
46 Nc0fe3ba028b44ac49853a9f07faf3b6e schema:name Springer Nature - SN SciGraph project
47 rdf:type schema:Organization
48 Ncb3d5dd934c94ae190b872920f280bf3 schema:familyName Saltor
49 schema:givenName Felix
50 rdf:type schema:Person
51 Nd0c30211cb164506a4f9c3ad2c4be1e2 rdf:first Nfcbb03bfff734aa8acd50132c420fa44
52 rdf:rest N72bebc4a5b384014a5bcc5d7422a89de
53 Nd235db291765438f9cd8191505f6e1c5 rdf:first sg:person.016436451607.36
54 rdf:rest rdf:nil
55 Nd260eadc21a046f9b5900d821799021a schema:location Berlin, Heidelberg
56 schema:name Springer Berlin Heidelberg
57 rdf:type schema:Organisation
58 Nde7eac0f07c94c68a3d854d52dc31439 rdf:first N54316ca5df2a4557935636a42d285cad
59 rdf:rest rdf:nil
60 Ndebb1330eb9745c5adf23adccb39a7d4 schema:name readcube_id
61 schema:value 6e6e0362b60eeedc7b6b05dc799e7477ef7dc9bf7f8587d6a293231cc08092e5
62 rdf:type schema:PropertyValue
63 Nf32eaba6a9ee4add914987373eee1efd schema:name dimensions_id
64 schema:value pub.1023178884
65 rdf:type schema:PropertyValue
66 Nfcbb03bfff734aa8acd50132c420fa44 schema:familyName Schek
67 schema:givenName Hans-Jörg
68 rdf:type schema:Person
69 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
70 schema:name Information and Computing Sciences
71 rdf:type schema:DefinedTerm
72 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
73 schema:name Information Systems
74 rdf:type schema:DefinedTerm
75 sg:person.010424707337.67 schema:affiliation https://www.grid.ac/institutes/grid.137628.9
76 schema:familyName Lin
77 schema:givenName Dao-I
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010424707337.67
79 rdf:type schema:Person
80 sg:person.016436451607.36 schema:affiliation https://www.grid.ac/institutes/grid.137628.9
81 schema:familyName Kedem
82 schema:givenName Zvi M.
83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016436451607.36
84 rdf:type schema:Person
85 https://www.grid.ac/institutes/grid.137628.9 schema:alternateName New York University
86 schema:name Department of Computer Science Courant Institute of Mathematical Sciences, New York University, USA
87 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...