Non-parametric Mixture Models for Clustering View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2010

AUTHORS

Pavan Kumar Mallapragada , Rong Jin , Anil Jain

ABSTRACT

Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered in practice. We propose a non-parametric mixture model (NMM) for data clustering in order to detect clusters generated from arbitrary unknown distributions, using non-parametric kernel density estimates. The proposed model is non-parametric since the generative distribution of each data point depends only on the rest of the data points and the chosen kernel. A leave-one-out likelihood maximization is performed to estimate the parameters of the model. The NMM approach, when applied to cluster high dimensional text datasets significantly outperforms the state-of-the-art and classical approaches such as K-means, Gaussian Mixture Models, spectral clustering and linkage methods. More... »

PAGES

334-343

Book

TITLE

Structural, Syntactic, and Statistical Pattern Recognition

ISBN

978-3-642-14979-5
978-3-642-14980-1

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-14980-1_32

DOI

http://dx.doi.org/10.1007/978-3-642-14980-1_32

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1039967277


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Computer Science and Engineering, Michigan State University, 48824, East Lansing, MI", 
          "id": "http://www.grid.ac/institutes/grid.17088.36", 
          "name": [
            "Department of Computer Science and Engineering, Michigan State University, 48824, East Lansing, MI"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Mallapragada", 
        "givenName": "Pavan Kumar", 
        "id": "sg:person.012756332403.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012756332403.40"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Computer Science and Engineering, Michigan State University, 48824, East Lansing, MI", 
          "id": "http://www.grid.ac/institutes/grid.17088.36", 
          "name": [
            "Department of Computer Science and Engineering, Michigan State University, 48824, East Lansing, MI"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Jin", 
        "givenName": "Rong", 
        "id": "sg:person.01274430471.12", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01274430471.12"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Computer Science and Engineering, Michigan State University, 48824, East Lansing, MI", 
          "id": "http://www.grid.ac/institutes/grid.17088.36", 
          "name": [
            "Department of Computer Science and Engineering, Michigan State University, 48824, East Lansing, MI"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Jain", 
        "givenName": "Anil", 
        "id": "sg:person.01031110710.30", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01031110710.30"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2010", 
    "datePublishedReg": "2010-01-01", 
    "description": "Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered in practice. We propose a non-parametric mixture model (NMM) for data clustering in order to detect clusters generated from arbitrary unknown distributions, using non-parametric kernel density estimates. The proposed model is non-parametric since the generative distribution of each data point depends only on the rest of the data points and the chosen kernel. A leave-one-out likelihood maximization is performed to estimate the parameters of the model. The NMM approach, when applied to cluster high dimensional text datasets significantly outperforms the state-of-the-art and classical approaches such as K-means, Gaussian Mixture Models, spectral clustering and linkage methods.", 
    "editor": [
      {
        "familyName": "Hancock", 
        "givenName": "Edwin R.", 
        "type": "Person"
      }, 
      {
        "familyName": "Wilson", 
        "givenName": "Richard C.", 
        "type": "Person"
      }, 
      {
        "familyName": "Windeatt", 
        "givenName": "Terry", 
        "type": "Person"
      }, 
      {
        "familyName": "Ulusoy", 
        "givenName": "Ilkay", 
        "type": "Person"
      }, 
      {
        "familyName": "Escolano", 
        "givenName": "Francisco", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-14980-1_32", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-14979-5", 
        "978-3-642-14980-1"
      ], 
      "name": "Structural, Syntactic, and Statistical Pattern Recognition", 
      "type": "Book"
    }, 
    "keywords": [
      "non-parametric mixture model", 
      "mixture model", 
      "multidimensional data distribution", 
      "arbitrary unknown distribution", 
      "unknown distribution", 
      "non-parametric kernel density estimates", 
      "likelihood maximization", 
      "parametric form", 
      "Gaussian mixture model", 
      "generative distribution", 
      "data points", 
      "kernel density estimates", 
      "spectral clustering", 
      "classical approach", 
      "data clustering", 
      "data distribution", 
      "density estimates", 
      "k-means", 
      "model", 
      "clustering", 
      "distribution", 
      "maximization", 
      "kernel", 
      "point", 
      "approach", 
      "parameters", 
      "estimates", 
      "linkage method", 
      "clusters", 
      "order", 
      "state", 
      "form", 
      "dataset", 
      "text datasets", 
      "data", 
      "art", 
      "rest", 
      "capacity", 
      "practice", 
      "method"
    ], 
    "name": "Non-parametric Mixture Models for Clustering", 
    "pagination": "334-343", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1039967277"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-14980-1_32"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-14980-1_32", 
      "https://app.dimensions.ai/details/publication/pub.1039967277"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-12-01T06:54", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/chapter/chapter_469.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-14980-1_32"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-14980-1_32'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-14980-1_32'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-14980-1_32'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-14980-1_32'


 

This table displays all metadata directly associated to this object as RDF triples.

133 TRIPLES      22 PREDICATES      65 URIs      58 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-14980-1_32 schema:about anzsrc-for:01
2 anzsrc-for:0104
3 schema:author N0e9975424a36447789a13e039736f050
4 schema:datePublished 2010
5 schema:datePublishedReg 2010-01-01
6 schema:description Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered in practice. We propose a non-parametric mixture model (NMM) for data clustering in order to detect clusters generated from arbitrary unknown distributions, using non-parametric kernel density estimates. The proposed model is non-parametric since the generative distribution of each data point depends only on the rest of the data points and the chosen kernel. A leave-one-out likelihood maximization is performed to estimate the parameters of the model. The NMM approach, when applied to cluster high dimensional text datasets significantly outperforms the state-of-the-art and classical approaches such as K-means, Gaussian Mixture Models, spectral clustering and linkage methods.
7 schema:editor N8f85d54507ed49a0902672208974f112
8 schema:genre chapter
9 schema:isAccessibleForFree true
10 schema:isPartOf Nfcfa288a45c247eb9f9e72319f1ad883
11 schema:keywords Gaussian mixture model
12 approach
13 arbitrary unknown distribution
14 art
15 capacity
16 classical approach
17 clustering
18 clusters
19 data
20 data clustering
21 data distribution
22 data points
23 dataset
24 density estimates
25 distribution
26 estimates
27 form
28 generative distribution
29 k-means
30 kernel
31 kernel density estimates
32 likelihood maximization
33 linkage method
34 maximization
35 method
36 mixture model
37 model
38 multidimensional data distribution
39 non-parametric kernel density estimates
40 non-parametric mixture model
41 order
42 parameters
43 parametric form
44 point
45 practice
46 rest
47 spectral clustering
48 state
49 text datasets
50 unknown distribution
51 schema:name Non-parametric Mixture Models for Clustering
52 schema:pagination 334-343
53 schema:productId N3441b8699fa242fdb1603abd72068920
54 N3cc9129d7af44c1183c14a4f45f9d019
55 schema:publisher N88eba0daa4384b2eb2c308c7cb35d018
56 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039967277
57 https://doi.org/10.1007/978-3-642-14980-1_32
58 schema:sdDatePublished 2022-12-01T06:54
59 schema:sdLicense https://scigraph.springernature.com/explorer/license/
60 schema:sdPublisher Nf717110ebe4941838709e11b58ba8cef
61 schema:url https://doi.org/10.1007/978-3-642-14980-1_32
62 sgo:license sg:explorer/license/
63 sgo:sdDataset chapters
64 rdf:type schema:Chapter
65 N0e9975424a36447789a13e039736f050 rdf:first sg:person.012756332403.40
66 rdf:rest N6b0fba52bf6a483babfd7deab1554e39
67 N1a4a254c74a149ab8fdc6495050b4c94 rdf:first N49ae8e846a0a45a398c87a4aeaa22161
68 rdf:rest Nfff37844367147fda174b9b10afd0a90
69 N23cf1c4140864b529ac9e1b8717647e2 schema:familyName Hancock
70 schema:givenName Edwin R.
71 rdf:type schema:Person
72 N31af8f7d2b2e4d95a31a4d87420dfce0 rdf:first sg:person.01031110710.30
73 rdf:rest rdf:nil
74 N3441b8699fa242fdb1603abd72068920 schema:name dimensions_id
75 schema:value pub.1039967277
76 rdf:type schema:PropertyValue
77 N3a0eab08f7bd4127ba4691de1361e884 schema:familyName Wilson
78 schema:givenName Richard C.
79 rdf:type schema:Person
80 N3cc9129d7af44c1183c14a4f45f9d019 schema:name doi
81 schema:value 10.1007/978-3-642-14980-1_32
82 rdf:type schema:PropertyValue
83 N458b63f0e12d424488adf66408bdb716 schema:familyName Escolano
84 schema:givenName Francisco
85 rdf:type schema:Person
86 N49ae8e846a0a45a398c87a4aeaa22161 schema:familyName Windeatt
87 schema:givenName Terry
88 rdf:type schema:Person
89 N6b0fba52bf6a483babfd7deab1554e39 rdf:first sg:person.01274430471.12
90 rdf:rest N31af8f7d2b2e4d95a31a4d87420dfce0
91 N719cc8c18da447d78c118e0a5eb2a4a6 rdf:first N3a0eab08f7bd4127ba4691de1361e884
92 rdf:rest N1a4a254c74a149ab8fdc6495050b4c94
93 N88eba0daa4384b2eb2c308c7cb35d018 schema:name Springer Nature
94 rdf:type schema:Organisation
95 N8f85d54507ed49a0902672208974f112 rdf:first N23cf1c4140864b529ac9e1b8717647e2
96 rdf:rest N719cc8c18da447d78c118e0a5eb2a4a6
97 Na03f92ce967f4edf99a3460b896929dd rdf:first N458b63f0e12d424488adf66408bdb716
98 rdf:rest rdf:nil
99 Na5410e66de5f42da90d89268b7b50137 schema:familyName Ulusoy
100 schema:givenName Ilkay
101 rdf:type schema:Person
102 Nf717110ebe4941838709e11b58ba8cef schema:name Springer Nature - SN SciGraph project
103 rdf:type schema:Organization
104 Nfcfa288a45c247eb9f9e72319f1ad883 schema:isbn 978-3-642-14979-5
105 978-3-642-14980-1
106 schema:name Structural, Syntactic, and Statistical Pattern Recognition
107 rdf:type schema:Book
108 Nfff37844367147fda174b9b10afd0a90 rdf:first Na5410e66de5f42da90d89268b7b50137
109 rdf:rest Na03f92ce967f4edf99a3460b896929dd
110 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
111 schema:name Mathematical Sciences
112 rdf:type schema:DefinedTerm
113 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
114 schema:name Statistics
115 rdf:type schema:DefinedTerm
116 sg:person.01031110710.30 schema:affiliation grid-institutes:grid.17088.36
117 schema:familyName Jain
118 schema:givenName Anil
119 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01031110710.30
120 rdf:type schema:Person
121 sg:person.01274430471.12 schema:affiliation grid-institutes:grid.17088.36
122 schema:familyName Jin
123 schema:givenName Rong
124 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01274430471.12
125 rdf:type schema:Person
126 sg:person.012756332403.40 schema:affiliation grid-institutes:grid.17088.36
127 schema:familyName Mallapragada
128 schema:givenName Pavan Kumar
129 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012756332403.40
130 rdf:type schema:Person
131 grid-institutes:grid.17088.36 schema:alternateName Department of Computer Science and Engineering, Michigan State University, 48824, East Lansing, MI
132 schema:name Department of Computer Science and Engineering, Michigan State University, 48824, East Lansing, MI
133 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...