Clustering categorical data: an approach based on dynamical systems View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2000-02

AUTHORS

David Gibson, Jon Kleinberg, Prabhakar Raghavan

ABSTRACT

. We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems. More... »

PAGES

222-236

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s007780050005

DOI

http://dx.doi.org/10.1007/s007780050005

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1031702635


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0804", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Data Format", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0805", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Distributed Computing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA; e-mail: dag@cs.berkeley.edu, US", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA; e-mail: dag@cs.berkeley.edu, US"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Gibson", 
        "givenName": "David", 
        "id": "sg:person.011534145461.80", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011534145461.80"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Computer Science, Cornell University, Ithaca, NY 14853; e-mail: kleinber@cs.cornell.edu, US", 
          "id": "http://www.grid.ac/institutes/grid.5386.8", 
          "name": [
            "Department of Computer Science, Cornell University, Ithaca, NY 14853; e-mail: kleinber@cs.cornell.edu, US"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kleinberg", 
        "givenName": "Jon", 
        "id": "sg:person.011522233557.04", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522233557.04"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Almaden Research Center IBM, San Jose, CA 95120 USA; e-mail: pragh@almaden.ibm.com, US", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Almaden Research Center IBM, San Jose, CA 95120 USA; e-mail: pragh@almaden.ibm.com, US"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Raghavan", 
        "givenName": "Prabhakar", 
        "id": "sg:person.012437241622.81", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012437241622.81"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2000-02", 
    "datePublishedReg": "2000-02-01", 
    "description": "Abstract. We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By \u201ccategorical data,\u201d we mean tables with fields that cannot be naturally ordered by a metric \u2013 e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems.", 
    "genre": "article", 
    "id": "sg:pub.10.1007/s007780050005", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1044889", 
        "issn": [
          "1066-8888", 
          "0949-877X"
        ], 
        "name": "The VLDB Journal", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "8"
      }
    ], 
    "keywords": [
      "dynamical systems", 
      "non-linear dynamical systems", 
      "categorical data", 
      "iterative method", 
      "collection of sets", 
      "certain types", 
      "approach", 
      "similarity measure", 
      "novel approach", 
      "system", 
      "categorical values", 
      "field", 
      "set", 
      "table", 
      "terms", 
      "applications", 
      "values", 
      "technique", 
      "data", 
      "names of products", 
      "dataset", 
      "types", 
      "analysis", 
      "measures", 
      "automobiles", 
      "collection", 
      "manufacturers", 
      "mining", 
      "weight", 
      "products", 
      "name", 
      "producers", 
      "method"
    ], 
    "name": "Clustering categorical data: an approach based on dynamical systems", 
    "pagination": "222-236", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1031702635"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s007780050005"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s007780050005", 
      "https://app.dimensions.ai/details/publication/pub.1031702635"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-05-10T09:47", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220509/entities/gbq_results/article/article_312.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1007/s007780050005"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s007780050005'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s007780050005'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s007780050005'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s007780050005'


 

This table displays all metadata directly associated to this object as RDF triples.

118 TRIPLES      21 PREDICATES      61 URIs      51 LITERALS      6 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s007780050005 schema:about anzsrc-for:08
2 anzsrc-for:0804
3 anzsrc-for:0805
4 anzsrc-for:0806
5 schema:author Nb53e388802f34a3da822db44eebe67e0
6 schema:datePublished 2000-02
7 schema:datePublishedReg 2000-02-01
8 schema:description Abstract. We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems.
9 schema:genre article
10 schema:inLanguage en
11 schema:isAccessibleForFree false
12 schema:isPartOf N606f616ed91c437198c10aae066f36f7
13 Na5f65208507d4879929002c3f019146e
14 sg:journal.1044889
15 schema:keywords analysis
16 applications
17 approach
18 automobiles
19 categorical data
20 categorical values
21 certain types
22 collection
23 collection of sets
24 data
25 dataset
26 dynamical systems
27 field
28 iterative method
29 manufacturers
30 measures
31 method
32 mining
33 name
34 names of products
35 non-linear dynamical systems
36 novel approach
37 producers
38 products
39 set
40 similarity measure
41 system
42 table
43 technique
44 terms
45 types
46 values
47 weight
48 schema:name Clustering categorical data: an approach based on dynamical systems
49 schema:pagination 222-236
50 schema:productId N1470b1c99f034bd88dddd9074f8b901e
51 Nd4a52c5ffd1445f9891c1919ac4c7920
52 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031702635
53 https://doi.org/10.1007/s007780050005
54 schema:sdDatePublished 2022-05-10T09:47
55 schema:sdLicense https://scigraph.springernature.com/explorer/license/
56 schema:sdPublisher N6ea55a2a922d40689f4b40e6cea20c99
57 schema:url https://doi.org/10.1007/s007780050005
58 sgo:license sg:explorer/license/
59 sgo:sdDataset articles
60 rdf:type schema:ScholarlyArticle
61 N1470b1c99f034bd88dddd9074f8b901e schema:name dimensions_id
62 schema:value pub.1031702635
63 rdf:type schema:PropertyValue
64 N606f616ed91c437198c10aae066f36f7 schema:issueNumber 3
65 rdf:type schema:PublicationIssue
66 N6ea55a2a922d40689f4b40e6cea20c99 schema:name Springer Nature - SN SciGraph project
67 rdf:type schema:Organization
68 N785ff1d7acfb49a98896b2b5defbf76a rdf:first sg:person.011522233557.04
69 rdf:rest Nb6bf070a54a6459ebc1d2dbf04e11cd9
70 Na5f65208507d4879929002c3f019146e schema:volumeNumber 8
71 rdf:type schema:PublicationVolume
72 Nb53e388802f34a3da822db44eebe67e0 rdf:first sg:person.011534145461.80
73 rdf:rest N785ff1d7acfb49a98896b2b5defbf76a
74 Nb6bf070a54a6459ebc1d2dbf04e11cd9 rdf:first sg:person.012437241622.81
75 rdf:rest rdf:nil
76 Nd4a52c5ffd1445f9891c1919ac4c7920 schema:name doi
77 schema:value 10.1007/s007780050005
78 rdf:type schema:PropertyValue
79 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
80 schema:name Information and Computing Sciences
81 rdf:type schema:DefinedTerm
82 anzsrc-for:0804 schema:inDefinedTermSet anzsrc-for:
83 schema:name Data Format
84 rdf:type schema:DefinedTerm
85 anzsrc-for:0805 schema:inDefinedTermSet anzsrc-for:
86 schema:name Distributed Computing
87 rdf:type schema:DefinedTerm
88 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
89 schema:name Information Systems
90 rdf:type schema:DefinedTerm
91 sg:journal.1044889 schema:issn 0949-877X
92 1066-8888
93 schema:name The VLDB Journal
94 schema:publisher Springer Nature
95 rdf:type schema:Periodical
96 sg:person.011522233557.04 schema:affiliation grid-institutes:grid.5386.8
97 schema:familyName Kleinberg
98 schema:givenName Jon
99 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522233557.04
100 rdf:type schema:Person
101 sg:person.011534145461.80 schema:affiliation grid-institutes:None
102 schema:familyName Gibson
103 schema:givenName David
104 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011534145461.80
105 rdf:type schema:Person
106 sg:person.012437241622.81 schema:affiliation grid-institutes:None
107 schema:familyName Raghavan
108 schema:givenName Prabhakar
109 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012437241622.81
110 rdf:type schema:Person
111 grid-institutes:None schema:alternateName Almaden Research Center IBM, San Jose, CA 95120 USA; e-mail: pragh@almaden.ibm.com, US
112 Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA; e-mail: dag@cs.berkeley.edu, US
113 schema:name Almaden Research Center IBM, San Jose, CA 95120 USA; e-mail: pragh@almaden.ibm.com, US
114 Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA; e-mail: dag@cs.berkeley.edu, US
115 rdf:type schema:Organization
116 grid-institutes:grid.5386.8 schema:alternateName Department of Computer Science, Cornell University, Ithaca, NY 14853; e-mail: kleinber@cs.cornell.edu, US
117 schema:name Department of Computer Science, Cornell University, Ithaca, NY 14853; e-mail: kleinber@cs.cornell.edu, US
118 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...