FGCH: a fast and grid based clustering algorithm for hybrid data stream View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2018-10-30

AUTHORS

Jinyin Chen, Xiang Lin, Qi Xuan, Yun Xiang

ABSTRACT

Streaming large volumes of data has a wide range of real-world applications, e.g., video flows, internet calls, and online games etc. Thus, fast and real-time data stream processing is important. Traditionally, data clustering algorithms are efficient and effective to mine information from large data. However, they are mostly not suitable for online data stream clustering. Therefore, in this work, we propose a novel fast and grid based clustering algorithm for hybrid data stream (FGCH). Specifically, we have made the following main contributions: 1), we develop a non-uniform attenuation model to enhance the resistance to noise; 2), we propose a similarity calculation method for hybrid data, which can calculate the similarity more efficiently and accurately; and 3), we present a novel clustering center fast determination algorithm (CCFD), which can automatically determine the number, center, and radius of clusters. Our technique is compared with several state-of-art clustering algorithms. The experimental results show that our technique can achieve more than better clustering accuracy on average. Meanwhile, the running time is shorter compared with the closest algorithm. More... »

PAGES

1228-1244

References to SciGraph publications

  • 2008-01-01. Data Streaming with Affinity Propagation in MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES
  • 2006-06-01. New clustering methods for interval data in COMPUTATIONAL STATISTICS
  • 2015. Big Data in Complex Systems, Challenges and Opportunities in NONE
  • 2014-04-24. Co-clustering over multiple dynamic data streams based on non-negative matrix factorization in APPLIED INTELLIGENCE
  • 2013-06-14. Clustering data streams using grid-based synopsis in KNOWLEDGE AND INFORMATION SYSTEMS
  • 2017-06-08. Mining top-k high-utility itemsets from a data stream under sliding window model in APPLIED INTELLIGENCE
  • 2015-06-06. Efficient mining of high-speed uncertain data streams in APPLIED INTELLIGENCE
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s10489-018-1324-x

    DOI

    http://dx.doi.org/10.1007/s10489-018-1324-x

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1107899741


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Computation Theory and Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information Systems", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China", 
              "id": "http://www.grid.ac/institutes/grid.469325.f", 
              "name": [
                "The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chen", 
            "givenName": "Jinyin", 
            "id": "sg:person.011005650160.25", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011005650160.25"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China", 
              "id": "http://www.grid.ac/institutes/grid.469325.f", 
              "name": [
                "The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lin", 
            "givenName": "Xiang", 
            "id": "sg:person.013313755020.68", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013313755020.68"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China", 
              "id": "http://www.grid.ac/institutes/grid.469325.f", 
              "name": [
                "The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Xuan", 
            "givenName": "Qi", 
            "id": "sg:person.01177056556.39", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01177056556.39"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China", 
              "id": "http://www.grid.ac/institutes/grid.469325.f", 
              "name": [
                "The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Xiang", 
            "givenName": "Yun", 
            "id": "sg:person.014357027635.35", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014357027635.35"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/s10115-013-0659-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030144595", 
              "https://doi.org/10.1007/s10115-013-0659-1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-319-11056-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009655103", 
              "https://doi.org/10.1007/978-3-319-11056-1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-87481-2_41", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046891092", 
              "https://doi.org/10.1007/978-3-540-87481-2_41"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10489-015-0675-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1028716342", 
              "https://doi.org/10.1007/s10489-015-0675-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10489-017-0939-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085936281", 
              "https://doi.org/10.1007/s10489-017-0939-7"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10489-014-0526-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008917987", 
              "https://doi.org/10.1007/s10489-014-0526-0"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s00180-006-0260-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1041584936", 
              "https://doi.org/10.1007/s00180-006-0260-0"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-10-30", 
        "datePublishedReg": "2018-10-30", 
        "description": "Streaming large volumes of data has a wide range of real-world applications, e.g., video flows, internet calls, and online games etc. Thus, fast and real-time data stream processing is important. Traditionally, data clustering algorithms are efficient and effective to mine information from large data. However, they are mostly not suitable for online data stream clustering. Therefore, in this work, we propose a novel fast and grid based clustering algorithm for hybrid data stream (FGCH). Specifically, we have made the following main contributions: 1), we develop a non-uniform attenuation model to enhance the resistance to noise; 2), we propose a similarity calculation method for hybrid data, which can calculate the similarity more efficiently and accurately; and 3), we present a novel clustering center fast determination algorithm (CCFD), which can automatically determine the number, center, and radius of clusters. Our technique is compared with several state-of-art clustering algorithms. The experimental results show that our technique can achieve more than better clustering accuracy on average. Meanwhile, the running time is shorter compared with the closest algorithm.", 
        "genre": "article", 
        "id": "sg:pub.10.1007/s10489-018-1324-x", 
        "inLanguage": "en", 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1136076", 
            "issn": [
              "0924-669X", 
              "1573-7497"
            ], 
            "name": "Applied Intelligence", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "4", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "49"
          }
        ], 
        "keywords": [
          "data streams", 
          "real-time data stream processing", 
          "real-world applications", 
          "data stream clustering", 
          "data stream processing", 
          "similarity calculation method", 
          "radius of clusters", 
          "stream clustering", 
          "close algorithm", 
          "stream processing", 
          "mine information", 
          "Internet calls", 
          "online games", 
          "large data", 
          "running time", 
          "Novel Fast", 
          "hybrid data", 
          "algorithm", 
          "determination algorithm", 
          "main contribution", 
          "experimental results", 
          "large volumes", 
          "grid", 
          "video", 
          "streams", 
          "clustering", 
          "technique", 
          "game", 
          "processing", 
          "information", 
          "accuracy", 
          "data", 
          "attenuation model", 
          "noise", 
          "applications", 
          "art", 
          "calculation method", 
          "wide range", 
          "fast", 
          "calls", 
          "work", 
          "clusters", 
          "method", 
          "model", 
          "similarity", 
          "number", 
          "time", 
          "results", 
          "state", 
          "contribution", 
          "center", 
          "volume", 
          "range", 
          "radius", 
          "resistance", 
          "online data stream clustering", 
          "hybrid data stream", 
          "non-uniform attenuation model", 
          "center fast determination algorithm", 
          "fast determination algorithm"
        ], 
        "name": "FGCH: a fast and grid based clustering algorithm for hybrid data stream", 
        "pagination": "1228-1244", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1107899741"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s10489-018-1324-x"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s10489-018-1324-x", 
          "https://app.dimensions.ai/details/publication/pub.1107899741"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2021-12-01T19:41", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20211201/entities/gbq_results/article/article_768.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1007/s10489-018-1324-x"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10489-018-1324-x'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10489-018-1324-x'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10489-018-1324-x'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10489-018-1324-x'


     

    This table displays all metadata directly associated to this object as RDF triples.

    175 TRIPLES      22 PREDICATES      94 URIs      77 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s10489-018-1324-x schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 anzsrc-for:0802
    4 anzsrc-for:0806
    5 schema:author N1c450b26fe354690958f7e5379d2f2a7
    6 schema:citation sg:pub.10.1007/978-3-319-11056-1
    7 sg:pub.10.1007/978-3-540-87481-2_41
    8 sg:pub.10.1007/s00180-006-0260-0
    9 sg:pub.10.1007/s10115-013-0659-1
    10 sg:pub.10.1007/s10489-014-0526-0
    11 sg:pub.10.1007/s10489-015-0675-9
    12 sg:pub.10.1007/s10489-017-0939-7
    13 schema:datePublished 2018-10-30
    14 schema:datePublishedReg 2018-10-30
    15 schema:description Streaming large volumes of data has a wide range of real-world applications, e.g., video flows, internet calls, and online games etc. Thus, fast and real-time data stream processing is important. Traditionally, data clustering algorithms are efficient and effective to mine information from large data. However, they are mostly not suitable for online data stream clustering. Therefore, in this work, we propose a novel fast and grid based clustering algorithm for hybrid data stream (FGCH). Specifically, we have made the following main contributions: 1), we develop a non-uniform attenuation model to enhance the resistance to noise; 2), we propose a similarity calculation method for hybrid data, which can calculate the similarity more efficiently and accurately; and 3), we present a novel clustering center fast determination algorithm (CCFD), which can automatically determine the number, center, and radius of clusters. Our technique is compared with several state-of-art clustering algorithms. The experimental results show that our technique can achieve more than better clustering accuracy on average. Meanwhile, the running time is shorter compared with the closest algorithm.
    16 schema:genre article
    17 schema:inLanguage en
    18 schema:isAccessibleForFree false
    19 schema:isPartOf N71ef53b4a75b428abdafe8e2ee89654a
    20 Nff0b4bf4692c4e819ea1f7e896e25c90
    21 sg:journal.1136076
    22 schema:keywords Internet calls
    23 Novel Fast
    24 accuracy
    25 algorithm
    26 applications
    27 art
    28 attenuation model
    29 calculation method
    30 calls
    31 center
    32 center fast determination algorithm
    33 close algorithm
    34 clustering
    35 clusters
    36 contribution
    37 data
    38 data stream clustering
    39 data stream processing
    40 data streams
    41 determination algorithm
    42 experimental results
    43 fast
    44 fast determination algorithm
    45 game
    46 grid
    47 hybrid data
    48 hybrid data stream
    49 information
    50 large data
    51 large volumes
    52 main contribution
    53 method
    54 mine information
    55 model
    56 noise
    57 non-uniform attenuation model
    58 number
    59 online data stream clustering
    60 online games
    61 processing
    62 radius
    63 radius of clusters
    64 range
    65 real-time data stream processing
    66 real-world applications
    67 resistance
    68 results
    69 running time
    70 similarity
    71 similarity calculation method
    72 state
    73 stream clustering
    74 stream processing
    75 streams
    76 technique
    77 time
    78 video
    79 volume
    80 wide range
    81 work
    82 schema:name FGCH: a fast and grid based clustering algorithm for hybrid data stream
    83 schema:pagination 1228-1244
    84 schema:productId N0cd72f5cbbc54fa9b6fdc9a9f1d755fa
    85 Nedcff523770f4dd09cd35136b1e85209
    86 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107899741
    87 https://doi.org/10.1007/s10489-018-1324-x
    88 schema:sdDatePublished 2021-12-01T19:41
    89 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    90 schema:sdPublisher Nae43ce9647294ac48dba19351f4be452
    91 schema:url https://doi.org/10.1007/s10489-018-1324-x
    92 sgo:license sg:explorer/license/
    93 sgo:sdDataset articles
    94 rdf:type schema:ScholarlyArticle
    95 N0cd72f5cbbc54fa9b6fdc9a9f1d755fa schema:name dimensions_id
    96 schema:value pub.1107899741
    97 rdf:type schema:PropertyValue
    98 N1c450b26fe354690958f7e5379d2f2a7 rdf:first sg:person.011005650160.25
    99 rdf:rest Ne635cb78ce294cf09a9af3cc274966b9
    100 N54b99e988cea44f9bff3a74ab829a252 rdf:first sg:person.014357027635.35
    101 rdf:rest rdf:nil
    102 N71ef53b4a75b428abdafe8e2ee89654a schema:issueNumber 4
    103 rdf:type schema:PublicationIssue
    104 N8c77ca48f5134a50b1d87289d232c46f rdf:first sg:person.01177056556.39
    105 rdf:rest N54b99e988cea44f9bff3a74ab829a252
    106 Nae43ce9647294ac48dba19351f4be452 schema:name Springer Nature - SN SciGraph project
    107 rdf:type schema:Organization
    108 Ne635cb78ce294cf09a9af3cc274966b9 rdf:first sg:person.013313755020.68
    109 rdf:rest N8c77ca48f5134a50b1d87289d232c46f
    110 Nedcff523770f4dd09cd35136b1e85209 schema:name doi
    111 schema:value 10.1007/s10489-018-1324-x
    112 rdf:type schema:PropertyValue
    113 Nff0b4bf4692c4e819ea1f7e896e25c90 schema:volumeNumber 49
    114 rdf:type schema:PublicationVolume
    115 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    116 schema:name Information and Computing Sciences
    117 rdf:type schema:DefinedTerm
    118 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    119 schema:name Artificial Intelligence and Image Processing
    120 rdf:type schema:DefinedTerm
    121 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
    122 schema:name Computation Theory and Mathematics
    123 rdf:type schema:DefinedTerm
    124 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
    125 schema:name Information Systems
    126 rdf:type schema:DefinedTerm
    127 sg:journal.1136076 schema:issn 0924-669X
    128 1573-7497
    129 schema:name Applied Intelligence
    130 schema:publisher Springer Nature
    131 rdf:type schema:Periodical
    132 sg:person.011005650160.25 schema:affiliation grid-institutes:grid.469325.f
    133 schema:familyName Chen
    134 schema:givenName Jinyin
    135 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011005650160.25
    136 rdf:type schema:Person
    137 sg:person.01177056556.39 schema:affiliation grid-institutes:grid.469325.f
    138 schema:familyName Xuan
    139 schema:givenName Qi
    140 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01177056556.39
    141 rdf:type schema:Person
    142 sg:person.013313755020.68 schema:affiliation grid-institutes:grid.469325.f
    143 schema:familyName Lin
    144 schema:givenName Xiang
    145 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013313755020.68
    146 rdf:type schema:Person
    147 sg:person.014357027635.35 schema:affiliation grid-institutes:grid.469325.f
    148 schema:familyName Xiang
    149 schema:givenName Yun
    150 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014357027635.35
    151 rdf:type schema:Person
    152 sg:pub.10.1007/978-3-319-11056-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009655103
    153 https://doi.org/10.1007/978-3-319-11056-1
    154 rdf:type schema:CreativeWork
    155 sg:pub.10.1007/978-3-540-87481-2_41 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046891092
    156 https://doi.org/10.1007/978-3-540-87481-2_41
    157 rdf:type schema:CreativeWork
    158 sg:pub.10.1007/s00180-006-0260-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041584936
    159 https://doi.org/10.1007/s00180-006-0260-0
    160 rdf:type schema:CreativeWork
    161 sg:pub.10.1007/s10115-013-0659-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030144595
    162 https://doi.org/10.1007/s10115-013-0659-1
    163 rdf:type schema:CreativeWork
    164 sg:pub.10.1007/s10489-014-0526-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008917987
    165 https://doi.org/10.1007/s10489-014-0526-0
    166 rdf:type schema:CreativeWork
    167 sg:pub.10.1007/s10489-015-0675-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028716342
    168 https://doi.org/10.1007/s10489-015-0675-9
    169 rdf:type schema:CreativeWork
    170 sg:pub.10.1007/s10489-017-0939-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085936281
    171 https://doi.org/10.1007/s10489-017-0939-7
    172 rdf:type schema:CreativeWork
    173 grid-institutes:grid.469325.f schema:alternateName The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
    174 schema:name The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
    175 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...