Clustering categorical data: an approach based on dynamical systems View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2000-02

AUTHORS

David Gibson, Jon Kleinberg, Prabhakar Raghavan

ABSTRACT

. We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems. More... »

PAGES

222-236

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s007780050005

DOI

http://dx.doi.org/10.1007/s007780050005

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1031702635


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0804", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Data Format", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0805", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Distributed Computing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA; e-mail: dag@cs.berkeley.edu, US", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA; e-mail: dag@cs.berkeley.edu, US"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Gibson", 
        "givenName": "David", 
        "id": "sg:person.011534145461.80", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011534145461.80"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Computer Science, Cornell University, Ithaca, NY 14853; e-mail: kleinber@cs.cornell.edu, US", 
          "id": "http://www.grid.ac/institutes/grid.5386.8", 
          "name": [
            "Department of Computer Science, Cornell University, Ithaca, NY 14853; e-mail: kleinber@cs.cornell.edu, US"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kleinberg", 
        "givenName": "Jon", 
        "id": "sg:person.011522233557.04", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522233557.04"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Almaden Research Center IBM, San Jose, CA 95120 USA; e-mail: pragh@almaden.ibm.com, US", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Almaden Research Center IBM, San Jose, CA 95120 USA; e-mail: pragh@almaden.ibm.com, US"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Raghavan", 
        "givenName": "Prabhakar", 
        "id": "sg:person.012437241622.81", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012437241622.81"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2000-02", 
    "datePublishedReg": "2000-02-01", 
    "description": "Abstract. We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By \u201ccategorical data,\u201d we mean tables with fields that cannot be naturally ordered by a metric \u2013 e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems.", 
    "genre": "article", 
    "id": "sg:pub.10.1007/s007780050005", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1044889", 
        "issn": [
          "1066-8888", 
          "0949-877X"
        ], 
        "name": "The VLDB Journal", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "8"
      }
    ], 
    "keywords": [
      "dynamical systems", 
      "non-linear dynamical systems", 
      "categorical data", 
      "iterative method", 
      "collection of sets", 
      "names of products", 
      "certain types", 
      "categorical values", 
      "approach", 
      "novel approach", 
      "system", 
      "similarity measure", 
      "set", 
      "applications", 
      "terms", 
      "table", 
      "technique", 
      "datasets", 
      "data", 
      "field", 
      "values", 
      "types", 
      "analysis", 
      "measures", 
      "mining", 
      "collection", 
      "automobiles", 
      "weight", 
      "products", 
      "manufacturers", 
      "name", 
      "producers", 
      "method", 
      "names of producers"
    ], 
    "name": "Clustering categorical data: an approach based on dynamical systems", 
    "pagination": "222-236", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1031702635"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s007780050005"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s007780050005", 
      "https://app.dimensions.ai/details/publication/pub.1031702635"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-01-01T18:11", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/article/article_337.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1007/s007780050005"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s007780050005'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s007780050005'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s007780050005'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s007780050005'


 

This table displays all metadata directly associated to this object as RDF triples.

119 TRIPLES      21 PREDICATES      62 URIs      52 LITERALS      6 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s007780050005 schema:about anzsrc-for:08
2 anzsrc-for:0804
3 anzsrc-for:0805
4 anzsrc-for:0806
5 schema:author Nf1fa113a3ca54c58b4fa691dfe898df2
6 schema:datePublished 2000-02
7 schema:datePublishedReg 2000-02-01
8 schema:description Abstract. We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems.
9 schema:genre article
10 schema:inLanguage en
11 schema:isAccessibleForFree false
12 schema:isPartOf N687eecbfd398406888bf4e9a2a36e210
13 Ne8e1e8e8690b42b28d31ad08b7190d8d
14 sg:journal.1044889
15 schema:keywords analysis
16 applications
17 approach
18 automobiles
19 categorical data
20 categorical values
21 certain types
22 collection
23 collection of sets
24 data
25 datasets
26 dynamical systems
27 field
28 iterative method
29 manufacturers
30 measures
31 method
32 mining
33 name
34 names of producers
35 names of products
36 non-linear dynamical systems
37 novel approach
38 producers
39 products
40 set
41 similarity measure
42 system
43 table
44 technique
45 terms
46 types
47 values
48 weight
49 schema:name Clustering categorical data: an approach based on dynamical systems
50 schema:pagination 222-236
51 schema:productId N3f1710b4bfb842279313b71c33cbe6de
52 N85cc46302b0f4f5ca4275b1a6ba20f05
53 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031702635
54 https://doi.org/10.1007/s007780050005
55 schema:sdDatePublished 2022-01-01T18:11
56 schema:sdLicense https://scigraph.springernature.com/explorer/license/
57 schema:sdPublisher Nf94c5c1049a043f780fd465885a19733
58 schema:url https://doi.org/10.1007/s007780050005
59 sgo:license sg:explorer/license/
60 sgo:sdDataset articles
61 rdf:type schema:ScholarlyArticle
62 N3f1710b4bfb842279313b71c33cbe6de schema:name dimensions_id
63 schema:value pub.1031702635
64 rdf:type schema:PropertyValue
65 N50bd058b89c74e23a2b7ffa91393d6d1 rdf:first sg:person.012437241622.81
66 rdf:rest rdf:nil
67 N687eecbfd398406888bf4e9a2a36e210 schema:issueNumber 3
68 rdf:type schema:PublicationIssue
69 N85cc46302b0f4f5ca4275b1a6ba20f05 schema:name doi
70 schema:value 10.1007/s007780050005
71 rdf:type schema:PropertyValue
72 Ne8e1e8e8690b42b28d31ad08b7190d8d schema:volumeNumber 8
73 rdf:type schema:PublicationVolume
74 Nf1fa113a3ca54c58b4fa691dfe898df2 rdf:first sg:person.011534145461.80
75 rdf:rest Nf390e3662417403a81cb2b5cc40571d2
76 Nf390e3662417403a81cb2b5cc40571d2 rdf:first sg:person.011522233557.04
77 rdf:rest N50bd058b89c74e23a2b7ffa91393d6d1
78 Nf94c5c1049a043f780fd465885a19733 schema:name Springer Nature - SN SciGraph project
79 rdf:type schema:Organization
80 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
81 schema:name Information and Computing Sciences
82 rdf:type schema:DefinedTerm
83 anzsrc-for:0804 schema:inDefinedTermSet anzsrc-for:
84 schema:name Data Format
85 rdf:type schema:DefinedTerm
86 anzsrc-for:0805 schema:inDefinedTermSet anzsrc-for:
87 schema:name Distributed Computing
88 rdf:type schema:DefinedTerm
89 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
90 schema:name Information Systems
91 rdf:type schema:DefinedTerm
92 sg:journal.1044889 schema:issn 0949-877X
93 1066-8888
94 schema:name The VLDB Journal
95 schema:publisher Springer Nature
96 rdf:type schema:Periodical
97 sg:person.011522233557.04 schema:affiliation grid-institutes:grid.5386.8
98 schema:familyName Kleinberg
99 schema:givenName Jon
100 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522233557.04
101 rdf:type schema:Person
102 sg:person.011534145461.80 schema:affiliation grid-institutes:None
103 schema:familyName Gibson
104 schema:givenName David
105 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011534145461.80
106 rdf:type schema:Person
107 sg:person.012437241622.81 schema:affiliation grid-institutes:None
108 schema:familyName Raghavan
109 schema:givenName Prabhakar
110 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012437241622.81
111 rdf:type schema:Person
112 grid-institutes:None schema:alternateName Almaden Research Center IBM, San Jose, CA 95120 USA; e-mail: pragh@almaden.ibm.com, US
113 Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA; e-mail: dag@cs.berkeley.edu, US
114 schema:name Almaden Research Center IBM, San Jose, CA 95120 USA; e-mail: pragh@almaden.ibm.com, US
115 Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA; e-mail: dag@cs.berkeley.edu, US
116 rdf:type schema:Organization
117 grid-institutes:grid.5386.8 schema:alternateName Department of Computer Science, Cornell University, Ithaca, NY 14853; e-mail: kleinber@cs.cornell.edu, US
118 schema:name Department of Computer Science, Cornell University, Ithaca, NY 14853; e-mail: kleinber@cs.cornell.edu, US
119 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...