Pincer-search: A new algorithm for discovering the maximum frequent set View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

1998

AUTHORS

Dao-I Lin , Zvi M. Kedem

ABSTRACT

Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent item-sets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms. More... »

PAGES

103-119

Book

TITLE

Advances in Database Technology — EDBT'98

ISBN

978-3-540-64264-0
978-3-540-69709-1

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/bfb0100980

DOI

http://dx.doi.org/10.1007/bfb0100980

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1023178884


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "New York University", 
          "id": "https://www.grid.ac/institutes/grid.137628.9", 
          "name": [
            "Department of Computer Science Courant Institute of Mathematical Sciences, New York University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lin", 
        "givenName": "Dao-I", 
        "id": "sg:person.010424707337.67", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010424707337.67"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "New York University", 
          "id": "https://www.grid.ac/institutes/grid.137628.9", 
          "name": [
            "Department of Computer Science Courant Institute of Mathematical Sciences, New York University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kedem", 
        "givenName": "Zvi M.", 
        "id": "sg:person.016436451607.36", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016436451607.36"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "1998", 
    "datePublishedReg": "1998-01-01", 
    "description": "Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent item-sets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.", 
    "editor": [
      {
        "familyName": "Schek", 
        "givenName": "Hans-J\u00f6rg", 
        "type": "Person"
      }, 
      {
        "familyName": "Alonso", 
        "givenName": "Gustavo", 
        "type": "Person"
      }, 
      {
        "familyName": "Saltor", 
        "givenName": "Felix", 
        "type": "Person"
      }, 
      {
        "familyName": "Ramos", 
        "givenName": "Isidro", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/bfb0100980", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-64264-0", 
        "978-3-540-69709-1"
      ], 
      "name": "Advances in Database Technology \u2014 EDBT'98", 
      "type": "Book"
    }, 
    "name": "Pincer-search: A new algorithm for discovering the maximum frequent set", 
    "pagination": "103-119", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/bfb0100980"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "6e6e0362b60eeedc7b6b05dc799e7477ef7dc9bf7f8587d6a293231cc08092e5"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1023178884"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/bfb0100980", 
      "https://app.dimensions.ai/details/publication/pub.1023178884"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T20:48", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8690_00000039.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/BFb0100980"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/bfb0100980'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/bfb0100980'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/bfb0100980'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/bfb0100980'


 

This table displays all metadata directly associated to this object as RDF triples.

87 TRIPLES      22 PREDICATES      27 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/bfb0100980 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N5f78998c1d904e638a8bb0dc0f842275
4 schema:datePublished 1998
5 schema:datePublishedReg 1998-01-01
6 schema:description Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent item-sets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.
7 schema:editor N2ab97bd307d246aa8aa3cc7fa230cfcb
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf N47eea557dbf745d6af403d3fc3bebee6
12 schema:name Pincer-search: A new algorithm for discovering the maximum frequent set
13 schema:pagination 103-119
14 schema:productId N021557ffd67a44989ff1556d8c8ac6b7
15 N0d4999c700824853ba31adbb03843ef1
16 Nf44c2121d9b9494d8f2a8d69be2a65dc
17 schema:publisher Nd40c8403aa204b7d8d87fc7dcdbe9456
18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023178884
19 https://doi.org/10.1007/bfb0100980
20 schema:sdDatePublished 2019-04-15T20:48
21 schema:sdLicense https://scigraph.springernature.com/explorer/license/
22 schema:sdPublisher N5bfe0e65ab8b4bd58a85a18fb1a14bcd
23 schema:url http://link.springer.com/10.1007/BFb0100980
24 sgo:license sg:explorer/license/
25 sgo:sdDataset chapters
26 rdf:type schema:Chapter
27 N021557ffd67a44989ff1556d8c8ac6b7 schema:name dimensions_id
28 schema:value pub.1023178884
29 rdf:type schema:PropertyValue
30 N0d4999c700824853ba31adbb03843ef1 schema:name readcube_id
31 schema:value 6e6e0362b60eeedc7b6b05dc799e7477ef7dc9bf7f8587d6a293231cc08092e5
32 rdf:type schema:PropertyValue
33 N1099e0ab5cd346bdb986be1ca96a6189 rdf:first Nf7dcde39b18f4348b2ec89657bcb067f
34 rdf:rest N854ff9320140428c969bb8140a4a6275
35 N1fbc5ee03ab04627a7ddfb6158c27a40 rdf:first N2a5443aeb1ba4cb98c263d1e3ac38fd0
36 rdf:rest N1099e0ab5cd346bdb986be1ca96a6189
37 N23fdbf224f9b4cfeb46037f928393ef2 schema:familyName Ramos
38 schema:givenName Isidro
39 rdf:type schema:Person
40 N2a5443aeb1ba4cb98c263d1e3ac38fd0 schema:familyName Alonso
41 schema:givenName Gustavo
42 rdf:type schema:Person
43 N2ab97bd307d246aa8aa3cc7fa230cfcb rdf:first Nd1a330056dd345a59b225ba89c9ae8e4
44 rdf:rest N1fbc5ee03ab04627a7ddfb6158c27a40
45 N47eea557dbf745d6af403d3fc3bebee6 schema:isbn 978-3-540-64264-0
46 978-3-540-69709-1
47 schema:name Advances in Database Technology — EDBT'98
48 rdf:type schema:Book
49 N5bfe0e65ab8b4bd58a85a18fb1a14bcd schema:name Springer Nature - SN SciGraph project
50 rdf:type schema:Organization
51 N5f78998c1d904e638a8bb0dc0f842275 rdf:first sg:person.010424707337.67
52 rdf:rest N638483f9b33941cd9aedbdc0a34a25d0
53 N638483f9b33941cd9aedbdc0a34a25d0 rdf:first sg:person.016436451607.36
54 rdf:rest rdf:nil
55 N854ff9320140428c969bb8140a4a6275 rdf:first N23fdbf224f9b4cfeb46037f928393ef2
56 rdf:rest rdf:nil
57 Nd1a330056dd345a59b225ba89c9ae8e4 schema:familyName Schek
58 schema:givenName Hans-Jörg
59 rdf:type schema:Person
60 Nd40c8403aa204b7d8d87fc7dcdbe9456 schema:location Berlin, Heidelberg
61 schema:name Springer Berlin Heidelberg
62 rdf:type schema:Organisation
63 Nf44c2121d9b9494d8f2a8d69be2a65dc schema:name doi
64 schema:value 10.1007/bfb0100980
65 rdf:type schema:PropertyValue
66 Nf7dcde39b18f4348b2ec89657bcb067f schema:familyName Saltor
67 schema:givenName Felix
68 rdf:type schema:Person
69 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
70 schema:name Information and Computing Sciences
71 rdf:type schema:DefinedTerm
72 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
73 schema:name Information Systems
74 rdf:type schema:DefinedTerm
75 sg:person.010424707337.67 schema:affiliation https://www.grid.ac/institutes/grid.137628.9
76 schema:familyName Lin
77 schema:givenName Dao-I
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010424707337.67
79 rdf:type schema:Person
80 sg:person.016436451607.36 schema:affiliation https://www.grid.ac/institutes/grid.137628.9
81 schema:familyName Kedem
82 schema:givenName Zvi M.
83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016436451607.36
84 rdf:type schema:Person
85 https://www.grid.ac/institutes/grid.137628.9 schema:alternateName New York University
86 schema:name Department of Computer Science Courant Institute of Mathematical Sciences, New York University, USA
87 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...