Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2005-01

AUTHORS

C.I. Ezeife, Yi Lu

ABSTRACT

Sequential mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. An important application of sequential mining techniques is web usage mining, for mining web log accesses, where the sequences of web page accesses made by different web users over a period of time, through a server, are recorded. Web access pattern tree (WAP-tree) mining is a sequential pattern mining technique for web log access sequences, which first stores the original web access sequence database on a prefix tree, similar to the frequent pattern tree (FP-tree) for storing non-sequential data. WAP-tree algorithm then, mines the frequent sequences from the WAP-tree by recursively re-constructing intermediate trees, starting with suffix sequences and ending with prefix sequences. This paper proposes a more efficient approach for using the WAP-tree to mine frequent sequences, which totally eliminates the need to engage in numerous re-construction of intermediate WAP-trees during mining. The proposed algorithm builds the frequent header node links of the original WAP-tree in a pre-order fashion and uses the position code of each node to identify the ancestor/descendant relationships between nodes of the tree. It then, finds each frequent sequential pattern, through progressive prefix sequence search, starting with its first prefix subsequence event. Experiments show huge performance gain over the WAP-tree technique. More... »

PAGES

5-38

References to SciGraph publications

  • 1996. Mining sequential patterns: Generalizations and performance improvements in ADVANCES IN DATABASE TECHNOLOGY — EDBT '96
  • 2004-01. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach in DATA MINING AND KNOWLEDGE DISCOVERY
  • 2000-03. Analysis of navigation behaviour in web sites integrating multiple information systems in THE VLDB JOURNAL
  • 2000. Data Mining of User Navigation Patterns in WEB USAGE ANALYSIS AND USER PROFILING
  • 2000. Mining Access Patterns Efficiently from Web Logs in KNOWLEDGE DISCOVERY AND DATA MINING. CURRENT ISSUES AND NEW APPLICATIONS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s10618-005-0248-3

    DOI

    http://dx.doi.org/10.1007/s10618-005-0248-3

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1051889932


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information Systems", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Windsor", 
              "id": "https://www.grid.ac/institutes/grid.267455.7", 
              "name": [
                "School of Computer Science, University of Windsor, N9B 3P4, Windsor, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Ezeife", 
            "givenName": "C.I.", 
            "id": "sg:person.01200460536.41", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01200460536.41"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Windsor", 
              "id": "https://www.grid.ac/institutes/grid.267455.7", 
              "name": [
                "School of Computer Science, University of Windsor, N9B 3P4, Windsor, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lu", 
            "givenName": "Yi", 
            "id": "sg:person.015700476427.69", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015700476427.69"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1016/s0169-023x(01)00008-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1001497929"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-45571-x_47", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007543846", 
              "https://doi.org/10.1007/3-540-45571-x_47"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/846183.846188", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009189318"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s007780050083", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011103093", 
              "https://doi.org/10.1007/s007780050083"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-44934-5_6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017446078", 
              "https://doi.org/10.1007/3-540-44934-5_6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/b:dami.0000005258.31418.83", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019489501", 
              "https://doi.org/10.1023/b:dami.0000005258.31418.83"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/347090.347167", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035246359"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bfb0014140", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050497818", 
              "https://doi.org/10.1007/bfb0014140"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icde.1995.380415", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094007712"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icde.2001.914830", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095566607"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2005-01", 
        "datePublishedReg": "2005-01-01", 
        "description": "Sequential mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. An important application of sequential mining techniques is web usage mining, for mining web log accesses, where the sequences of web page accesses made by different web users over a period of time, through a server, are recorded. Web access pattern tree (WAP-tree) mining is a sequential pattern mining technique for web log access sequences, which first stores the original web access sequence database on a prefix tree, similar to the frequent pattern tree (FP-tree) for storing non-sequential data. WAP-tree algorithm then, mines the frequent sequences from the WAP-tree by recursively re-constructing intermediate trees, starting with suffix sequences and ending with prefix sequences. This paper proposes a more efficient approach for using the WAP-tree to mine frequent sequences, which totally eliminates the need to engage in numerous re-construction of intermediate WAP-trees during mining. The proposed algorithm builds the frequent header node links of the original WAP-tree in a pre-order fashion and uses the position code of each node to identify the ancestor/descendant relationships between nodes of the tree. It then, finds each frequent sequential pattern, through progressive prefix sequence search, starting with its first prefix subsequence event. Experiments show huge performance gain over the WAP-tree technique.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1007/s10618-005-0248-3", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1041853", 
            "issn": [
              "1384-5810", 
              "1573-756X"
            ], 
            "name": "Data Mining and Knowledge Discovery", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "10"
          }
        ], 
        "name": "Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree", 
        "pagination": "5-38", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "f72c690d17b2bec9ec33e4303f34337db9d1c1255542470efb65b2632e62efda"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s10618-005-0248-3"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1051889932"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s10618-005-0248-3", 
          "https://app.dimensions.ai/details/publication/pub.1051889932"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T00:20", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8695_00000535.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "http://link.springer.com/10.1007%2Fs10618-005-0248-3"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10618-005-0248-3'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10618-005-0248-3'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10618-005-0248-3'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10618-005-0248-3'


     

    This table displays all metadata directly associated to this object as RDF triples.

    103 TRIPLES      21 PREDICATES      37 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s10618-005-0248-3 schema:about anzsrc-for:08
    2 anzsrc-for:0806
    3 schema:author N590fc251778d4967ad87965a07198e33
    4 schema:citation sg:pub.10.1007/3-540-44934-5_6
    5 sg:pub.10.1007/3-540-45571-x_47
    6 sg:pub.10.1007/bfb0014140
    7 sg:pub.10.1007/s007780050083
    8 sg:pub.10.1023/b:dami.0000005258.31418.83
    9 https://doi.org/10.1016/s0169-023x(01)00008-8
    10 https://doi.org/10.1109/icde.1995.380415
    11 https://doi.org/10.1109/icde.2001.914830
    12 https://doi.org/10.1145/347090.347167
    13 https://doi.org/10.1145/846183.846188
    14 schema:datePublished 2005-01
    15 schema:datePublishedReg 2005-01-01
    16 schema:description Sequential mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. An important application of sequential mining techniques is web usage mining, for mining web log accesses, where the sequences of web page accesses made by different web users over a period of time, through a server, are recorded. Web access pattern tree (WAP-tree) mining is a sequential pattern mining technique for web log access sequences, which first stores the original web access sequence database on a prefix tree, similar to the frequent pattern tree (FP-tree) for storing non-sequential data. WAP-tree algorithm then, mines the frequent sequences from the WAP-tree by recursively re-constructing intermediate trees, starting with suffix sequences and ending with prefix sequences. This paper proposes a more efficient approach for using the WAP-tree to mine frequent sequences, which totally eliminates the need to engage in numerous re-construction of intermediate WAP-trees during mining. The proposed algorithm builds the frequent header node links of the original WAP-tree in a pre-order fashion and uses the position code of each node to identify the ancestor/descendant relationships between nodes of the tree. It then, finds each frequent sequential pattern, through progressive prefix sequence search, starting with its first prefix subsequence event. Experiments show huge performance gain over the WAP-tree technique.
    17 schema:genre research_article
    18 schema:inLanguage en
    19 schema:isAccessibleForFree true
    20 schema:isPartOf Na265ad82eebd4606830542c66bbaaa6e
    21 Nba03d63bcec64405900b2a95905490a1
    22 sg:journal.1041853
    23 schema:name Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree
    24 schema:pagination 5-38
    25 schema:productId N2316421671ee43d28ab8870f99a2a2cf
    26 N4ed5912aa4fc43d8b41c94d78e406196
    27 N9d3c827b2c4348c1914abefff2ba06df
    28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051889932
    29 https://doi.org/10.1007/s10618-005-0248-3
    30 schema:sdDatePublished 2019-04-11T00:20
    31 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    32 schema:sdPublisher Ndb2e913a73194ed0a88e364ee8691677
    33 schema:url http://link.springer.com/10.1007%2Fs10618-005-0248-3
    34 sgo:license sg:explorer/license/
    35 sgo:sdDataset articles
    36 rdf:type schema:ScholarlyArticle
    37 N2316421671ee43d28ab8870f99a2a2cf schema:name readcube_id
    38 schema:value f72c690d17b2bec9ec33e4303f34337db9d1c1255542470efb65b2632e62efda
    39 rdf:type schema:PropertyValue
    40 N4ed5912aa4fc43d8b41c94d78e406196 schema:name doi
    41 schema:value 10.1007/s10618-005-0248-3
    42 rdf:type schema:PropertyValue
    43 N590fc251778d4967ad87965a07198e33 rdf:first sg:person.01200460536.41
    44 rdf:rest Nd2350c84503c478486d6f43233ddefa2
    45 N9d3c827b2c4348c1914abefff2ba06df schema:name dimensions_id
    46 schema:value pub.1051889932
    47 rdf:type schema:PropertyValue
    48 Na265ad82eebd4606830542c66bbaaa6e schema:volumeNumber 10
    49 rdf:type schema:PublicationVolume
    50 Nba03d63bcec64405900b2a95905490a1 schema:issueNumber 1
    51 rdf:type schema:PublicationIssue
    52 Nd2350c84503c478486d6f43233ddefa2 rdf:first sg:person.015700476427.69
    53 rdf:rest rdf:nil
    54 Ndb2e913a73194ed0a88e364ee8691677 schema:name Springer Nature - SN SciGraph project
    55 rdf:type schema:Organization
    56 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    57 schema:name Information and Computing Sciences
    58 rdf:type schema:DefinedTerm
    59 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
    60 schema:name Information Systems
    61 rdf:type schema:DefinedTerm
    62 sg:journal.1041853 schema:issn 1384-5810
    63 1573-756X
    64 schema:name Data Mining and Knowledge Discovery
    65 rdf:type schema:Periodical
    66 sg:person.01200460536.41 schema:affiliation https://www.grid.ac/institutes/grid.267455.7
    67 schema:familyName Ezeife
    68 schema:givenName C.I.
    69 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01200460536.41
    70 rdf:type schema:Person
    71 sg:person.015700476427.69 schema:affiliation https://www.grid.ac/institutes/grid.267455.7
    72 schema:familyName Lu
    73 schema:givenName Yi
    74 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015700476427.69
    75 rdf:type schema:Person
    76 sg:pub.10.1007/3-540-44934-5_6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017446078
    77 https://doi.org/10.1007/3-540-44934-5_6
    78 rdf:type schema:CreativeWork
    79 sg:pub.10.1007/3-540-45571-x_47 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007543846
    80 https://doi.org/10.1007/3-540-45571-x_47
    81 rdf:type schema:CreativeWork
    82 sg:pub.10.1007/bfb0014140 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050497818
    83 https://doi.org/10.1007/bfb0014140
    84 rdf:type schema:CreativeWork
    85 sg:pub.10.1007/s007780050083 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011103093
    86 https://doi.org/10.1007/s007780050083
    87 rdf:type schema:CreativeWork
    88 sg:pub.10.1023/b:dami.0000005258.31418.83 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019489501
    89 https://doi.org/10.1023/b:dami.0000005258.31418.83
    90 rdf:type schema:CreativeWork
    91 https://doi.org/10.1016/s0169-023x(01)00008-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001497929
    92 rdf:type schema:CreativeWork
    93 https://doi.org/10.1109/icde.1995.380415 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094007712
    94 rdf:type schema:CreativeWork
    95 https://doi.org/10.1109/icde.2001.914830 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095566607
    96 rdf:type schema:CreativeWork
    97 https://doi.org/10.1145/347090.347167 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035246359
    98 rdf:type schema:CreativeWork
    99 https://doi.org/10.1145/846183.846188 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009189318
    100 rdf:type schema:CreativeWork
    101 https://www.grid.ac/institutes/grid.267455.7 schema:alternateName University of Windsor
    102 schema:name School of Computer Science, University of Windsor, N9B 3P4, Windsor, Ontario, Canada
    103 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...