Topic Modeling over Short Texts by Incorporating Word Embeddings View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2017

AUTHORS

Jipeng Qiang , Ping Chen , Tong Wang , Xindong Wu

ABSTRACT

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-occurrence information is available in short texts. This paper studies how to incorporate the external word correlation knowledge into short texts to improve the coherence of topic modeling. Based on recent results in word embeddings that learn semantically representations for words from a large corpus, we introduce a novel method, Embedding-based Topic Model (ETM), to learn latent topics from short texts. ETM not only solves the problem of very limited word co-occurrence information by aggregating short texts into long pseudo-texts, but also utilizes a Markov Random Field regularized model that gives correlated words a better chance to be put into the same topic. The experiments on real-world datasets validate the effectiveness of our model comparing with the state-of-the-art models. More... »

PAGES

363-374

References to SciGraph publications

Book

TITLE

Advances in Knowledge Discovery and Data Mining

ISBN

978-3-319-57528-5
978-3-319-57529-2

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-57529-2_29

DOI

http://dx.doi.org/10.1007/978-3-319-57529-2_29

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1085123178


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1702", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Cognitive Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology and Cognitive Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Massachusetts Boston", 
          "id": "https://www.grid.ac/institutes/grid.266685.9", 
          "name": [
            "Yangzhou University", 
            "Hefei University of Technology", 
            "University of Massachusetts Boston"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Qiang", 
        "givenName": "Jipeng", 
        "id": "sg:person.011323633123.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011323633123.50"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Massachusetts Boston", 
          "id": "https://www.grid.ac/institutes/grid.266685.9", 
          "name": [
            "University of Massachusetts Boston"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chen", 
        "givenName": "Ping", 
        "id": "sg:person.012217262135.21", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012217262135.21"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Massachusetts Boston", 
          "id": "https://www.grid.ac/institutes/grid.266685.9", 
          "name": [
            "University of Massachusetts Boston"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wang", 
        "givenName": "Tong", 
        "id": "sg:person.012365741547.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012365741547.02"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Louisiana at Lafayette", 
          "id": "https://www.grid.ac/institutes/grid.266621.7", 
          "name": [
            "Hefei University of Technology", 
            "University of Louisiana at Lafayette"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wu", 
        "givenName": "Xindong", 
        "id": "sg:person.016521501713.27", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016521501713.27"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/978-3-642-20161-5_34", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012531775", 
          "https://doi.org/10.1007/978-3-642-20161-5_34"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-20161-5_34", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012531775", 
          "https://doi.org/10.1007/978-3-642-20161-5_34"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2484028.2484166", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014502103"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2623330.2623715", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016982248"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1007692713085", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020997911", 
          "https://doi.org/10.1023/a:1007692713085"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.0307752101", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026144033"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1718487.1718520", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037087000"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/312624.312649", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044685375"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4614-3223-4_4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049258309", 
          "https://doi.org/10.1007/978-1-4614-3223-4_4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tkde.2014.2313872", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061662879"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/ictai.2016.0039", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094188425"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/v1/n15-1074", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099105864"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/v1/n15-1074", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099105864"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/v1/d14-1162", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099110523"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/v1/d14-1162", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099110523"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2017", 
    "datePublishedReg": "2017-01-01", 
    "description": "Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-occurrence information is available in short texts. This paper studies how to incorporate the external word correlation knowledge into short texts to improve the coherence of topic modeling. Based on recent results in word embeddings that learn semantically representations for words from a large corpus, we introduce a novel method, Embedding-based Topic Model (ETM), to learn latent topics from short texts. ETM not only solves the problem of very limited word co-occurrence information by aggregating short texts into long pseudo-texts, but also utilizes a Markov Random Field regularized model that gives correlated words a better chance to be put into the same topic. The experiments on real-world datasets validate the effectiveness of our model comparing with the state-of-the-art models.", 
    "editor": [
      {
        "familyName": "Kim", 
        "givenName": "Jinho", 
        "type": "Person"
      }, 
      {
        "familyName": "Shim", 
        "givenName": "Kyuseok", 
        "type": "Person"
      }, 
      {
        "familyName": "Cao", 
        "givenName": "Longbing", 
        "type": "Person"
      }, 
      {
        "familyName": "Lee", 
        "givenName": "Jae-Gil", 
        "type": "Person"
      }, 
      {
        "familyName": "Lin", 
        "givenName": "Xuemin", 
        "type": "Person"
      }, 
      {
        "familyName": "Moon", 
        "givenName": "Yang-Sae", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-57529-2_29", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-319-57528-5", 
        "978-3-319-57529-2"
      ], 
      "name": "Advances in Knowledge Discovery and Data Mining", 
      "type": "Book"
    }, 
    "name": "Topic Modeling over Short Texts by Incorporating Word Embeddings", 
    "pagination": "363-374", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-57529-2_29"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "b7286635ebd298671ac7aa92ac6b7c8108d5691ee807e5a21b2449d3dd21b6d9"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1085123178"
        ]
      }
    ], 
    "publisher": {
      "location": "Cham", 
      "name": "Springer International Publishing", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-57529-2_29", 
      "https://app.dimensions.ai/details/publication/pub.1085123178"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T19:48", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8684_00000600.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/978-3-319-57529-2_29"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-57529-2_29'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-57529-2_29'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-57529-2_29'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-57529-2_29'


 

This table displays all metadata directly associated to this object as RDF triples.

156 TRIPLES      23 PREDICATES      39 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-57529-2_29 schema:about anzsrc-for:17
2 anzsrc-for:1702
3 schema:author Neec61014b18c4ca6a7e35c034cfb0302
4 schema:citation sg:pub.10.1007/978-1-4614-3223-4_4
5 sg:pub.10.1007/978-3-642-20161-5_34
6 sg:pub.10.1023/a:1007692713085
7 https://doi.org/10.1073/pnas.0307752101
8 https://doi.org/10.1109/ictai.2016.0039
9 https://doi.org/10.1109/tkde.2014.2313872
10 https://doi.org/10.1145/1718487.1718520
11 https://doi.org/10.1145/2484028.2484166
12 https://doi.org/10.1145/2623330.2623715
13 https://doi.org/10.1145/312624.312649
14 https://doi.org/10.3115/v1/d14-1162
15 https://doi.org/10.3115/v1/n15-1074
16 schema:datePublished 2017
17 schema:datePublishedReg 2017-01-01
18 schema:description Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-occurrence information is available in short texts. This paper studies how to incorporate the external word correlation knowledge into short texts to improve the coherence of topic modeling. Based on recent results in word embeddings that learn semantically representations for words from a large corpus, we introduce a novel method, Embedding-based Topic Model (ETM), to learn latent topics from short texts. ETM not only solves the problem of very limited word co-occurrence information by aggregating short texts into long pseudo-texts, but also utilizes a Markov Random Field regularized model that gives correlated words a better chance to be put into the same topic. The experiments on real-world datasets validate the effectiveness of our model comparing with the state-of-the-art models.
19 schema:editor Ne65b7e71eeb847bca74034aae7ed1341
20 schema:genre chapter
21 schema:inLanguage en
22 schema:isAccessibleForFree true
23 schema:isPartOf Na0e6bd2cbab045129fa193cc4c34e4de
24 schema:name Topic Modeling over Short Texts by Incorporating Word Embeddings
25 schema:pagination 363-374
26 schema:productId N236e1057a7c24c168a9a88480c139292
27 Na9e34c86126a45399b01930052976839
28 Ncde3dd9c1df64bae986ac80fe7c763c0
29 schema:publisher Nc3884c93cbc3423095ed4c899adae0cf
30 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085123178
31 https://doi.org/10.1007/978-3-319-57529-2_29
32 schema:sdDatePublished 2019-04-15T19:48
33 schema:sdLicense https://scigraph.springernature.com/explorer/license/
34 schema:sdPublisher N2477f58c6a2440eb87afae97ed71c849
35 schema:url http://link.springer.com/10.1007/978-3-319-57529-2_29
36 sgo:license sg:explorer/license/
37 sgo:sdDataset chapters
38 rdf:type schema:Chapter
39 N236e1057a7c24c168a9a88480c139292 schema:name dimensions_id
40 schema:value pub.1085123178
41 rdf:type schema:PropertyValue
42 N2477f58c6a2440eb87afae97ed71c849 schema:name Springer Nature - SN SciGraph project
43 rdf:type schema:Organization
44 N27a7f55faa064cf6b6cb43b3df12eac3 rdf:first sg:person.016521501713.27
45 rdf:rest rdf:nil
46 N36289012c38f4a5db33afac9ddbb99ce rdf:first Nc6f8042632b9487eb41307d3c74b7ba0
47 rdf:rest Ne883d24d6ff44089a0c7c91ce2341d28
48 N48cbafaa15ed4b9da3dd2a3e27437b58 schema:familyName Lin
49 schema:givenName Xuemin
50 rdf:type schema:Person
51 N4a11e228cfb045849b379e2e9c104641 schema:familyName Moon
52 schema:givenName Yang-Sae
53 rdf:type schema:Person
54 N7c6cbd8b55a84fcd8f5614126e204d9a schema:familyName Lee
55 schema:givenName Jae-Gil
56 rdf:type schema:Person
57 N7e314de1b84c4a1f86013c90d45adad1 rdf:first N7c6cbd8b55a84fcd8f5614126e204d9a
58 rdf:rest N9bd933c8145d484fbd0248920bfeb31d
59 N82f6b5b384cf443f839ce61acb7dc4d7 rdf:first N4a11e228cfb045849b379e2e9c104641
60 rdf:rest rdf:nil
61 N9bd933c8145d484fbd0248920bfeb31d rdf:first N48cbafaa15ed4b9da3dd2a3e27437b58
62 rdf:rest N82f6b5b384cf443f839ce61acb7dc4d7
63 Na0e6bd2cbab045129fa193cc4c34e4de schema:isbn 978-3-319-57528-5
64 978-3-319-57529-2
65 schema:name Advances in Knowledge Discovery and Data Mining
66 rdf:type schema:Book
67 Na9e34c86126a45399b01930052976839 schema:name readcube_id
68 schema:value b7286635ebd298671ac7aa92ac6b7c8108d5691ee807e5a21b2449d3dd21b6d9
69 rdf:type schema:PropertyValue
70 Nc3884c93cbc3423095ed4c899adae0cf schema:location Cham
71 schema:name Springer International Publishing
72 rdf:type schema:Organisation
73 Nc6f8042632b9487eb41307d3c74b7ba0 schema:familyName Shim
74 schema:givenName Kyuseok
75 rdf:type schema:Person
76 Ncdce56aa50394c25a4ee1f9ca01d8750 rdf:first sg:person.012217262135.21
77 rdf:rest Ndbb226e0a9674504b0c5350dd00e542d
78 Ncde3dd9c1df64bae986ac80fe7c763c0 schema:name doi
79 schema:value 10.1007/978-3-319-57529-2_29
80 rdf:type schema:PropertyValue
81 Ndbb226e0a9674504b0c5350dd00e542d rdf:first sg:person.012365741547.02
82 rdf:rest N27a7f55faa064cf6b6cb43b3df12eac3
83 Nddd525067a1c4289b07f57437e232efb schema:familyName Cao
84 schema:givenName Longbing
85 rdf:type schema:Person
86 Ne269ba87ee964403be35a440bb8a79e8 schema:familyName Kim
87 schema:givenName Jinho
88 rdf:type schema:Person
89 Ne65b7e71eeb847bca74034aae7ed1341 rdf:first Ne269ba87ee964403be35a440bb8a79e8
90 rdf:rest N36289012c38f4a5db33afac9ddbb99ce
91 Ne883d24d6ff44089a0c7c91ce2341d28 rdf:first Nddd525067a1c4289b07f57437e232efb
92 rdf:rest N7e314de1b84c4a1f86013c90d45adad1
93 Neec61014b18c4ca6a7e35c034cfb0302 rdf:first sg:person.011323633123.50
94 rdf:rest Ncdce56aa50394c25a4ee1f9ca01d8750
95 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
96 schema:name Psychology and Cognitive Sciences
97 rdf:type schema:DefinedTerm
98 anzsrc-for:1702 schema:inDefinedTermSet anzsrc-for:
99 schema:name Cognitive Sciences
100 rdf:type schema:DefinedTerm
101 sg:person.011323633123.50 schema:affiliation https://www.grid.ac/institutes/grid.266685.9
102 schema:familyName Qiang
103 schema:givenName Jipeng
104 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011323633123.50
105 rdf:type schema:Person
106 sg:person.012217262135.21 schema:affiliation https://www.grid.ac/institutes/grid.266685.9
107 schema:familyName Chen
108 schema:givenName Ping
109 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012217262135.21
110 rdf:type schema:Person
111 sg:person.012365741547.02 schema:affiliation https://www.grid.ac/institutes/grid.266685.9
112 schema:familyName Wang
113 schema:givenName Tong
114 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012365741547.02
115 rdf:type schema:Person
116 sg:person.016521501713.27 schema:affiliation https://www.grid.ac/institutes/grid.266621.7
117 schema:familyName Wu
118 schema:givenName Xindong
119 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016521501713.27
120 rdf:type schema:Person
121 sg:pub.10.1007/978-1-4614-3223-4_4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049258309
122 https://doi.org/10.1007/978-1-4614-3223-4_4
123 rdf:type schema:CreativeWork
124 sg:pub.10.1007/978-3-642-20161-5_34 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012531775
125 https://doi.org/10.1007/978-3-642-20161-5_34
126 rdf:type schema:CreativeWork
127 sg:pub.10.1023/a:1007692713085 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020997911
128 https://doi.org/10.1023/a:1007692713085
129 rdf:type schema:CreativeWork
130 https://doi.org/10.1073/pnas.0307752101 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026144033
131 rdf:type schema:CreativeWork
132 https://doi.org/10.1109/ictai.2016.0039 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094188425
133 rdf:type schema:CreativeWork
134 https://doi.org/10.1109/tkde.2014.2313872 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061662879
135 rdf:type schema:CreativeWork
136 https://doi.org/10.1145/1718487.1718520 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037087000
137 rdf:type schema:CreativeWork
138 https://doi.org/10.1145/2484028.2484166 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014502103
139 rdf:type schema:CreativeWork
140 https://doi.org/10.1145/2623330.2623715 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016982248
141 rdf:type schema:CreativeWork
142 https://doi.org/10.1145/312624.312649 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044685375
143 rdf:type schema:CreativeWork
144 https://doi.org/10.3115/v1/d14-1162 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099110523
145 rdf:type schema:CreativeWork
146 https://doi.org/10.3115/v1/n15-1074 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099105864
147 rdf:type schema:CreativeWork
148 https://www.grid.ac/institutes/grid.266621.7 schema:alternateName University of Louisiana at Lafayette
149 schema:name Hefei University of Technology
150 University of Louisiana at Lafayette
151 rdf:type schema:Organization
152 https://www.grid.ac/institutes/grid.266685.9 schema:alternateName University of Massachusetts Boston
153 schema:name Hefei University of Technology
154 University of Massachusetts Boston
155 Yangzhou University
156 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...