Concept embedding-based weighting scheme for biomedical text clustering and visualization View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-11-01

AUTHORS

Xiao Luo, Setu Shah

ABSTRACT

Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf–idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies–Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure. More... »

PAGES

8

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8

DOI

http://dx.doi.org/10.1186/s40535-018-0055-8

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1107951795


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Computer Information Technology, IUPUI, Indianapolis, USA", 
          "id": "http://www.grid.ac/institutes/grid.257413.6", 
          "name": [
            "Department of Computer Information Technology, IUPUI, Indianapolis, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Luo", 
        "givenName": "Xiao", 
        "id": "sg:person.013273233436.84", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013273233436.84"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Electrical and Computer Engineering, IUPUI, Indianapolis, USA", 
          "id": "http://www.grid.ac/institutes/grid.257413.6", 
          "name": [
            "Department of Electrical and Computer Engineering, IUPUI, Indianapolis, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Shah", 
        "givenName": "Setu", 
        "id": "sg:person.010375243714.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010375243714.54"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/978-3-540-71703-4_12", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012685859", 
          "https://doi.org/10.1007/978-3-540-71703-4_12"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-15384-6_45", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001564532", 
          "https://doi.org/10.1007/978-3-642-15384-6_45"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12911-017-0498-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1090274713", 
          "https://doi.org/10.1186/s12911-017-0498-1"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-11-01", 
    "datePublishedReg": "2018-11-01", 
    "description": "Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf\u2013idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies\u2013Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/s40535-018-0055-8", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1053269", 
        "issn": [
          "2196-0089"
        ], 
        "name": "Applied Informatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "5"
      }
    ], 
    "keywords": [
      "concept embeddings", 
      "text clustering", 
      "text collections", 
      "new weighting scheme", 
      "weighting scheme", 
      "Davies-Bouldin index", 
      "text mining techniques", 
      "TF-IDF", 
      "mining techniques", 
      "returned results", 
      "document representation", 
      "neural network", 
      "document search", 
      "document vectors", 
      "word embeddings", 
      "concept associations", 
      "Clustering Evaluation", 
      "clustering", 
      "embedding", 
      "visualization", 
      "scheme", 
      "more clusters", 
      "browsing", 
      "retrieval", 
      "network", 
      "concept", 
      "collection", 
      "representation", 
      "clusters", 
      "search", 
      "technique", 
      "vector", 
      "research", 
      "evaluation", 
      "use", 
      "results", 
      "specific diseases", 
      "measures", 
      "types", 
      "index", 
      "association", 
      "disease", 
      "Biomedical text clustering", 
      "better document search", 
      "clinical text collections", 
      "different biomedical text collections", 
      "biomedical text collections", 
      "baseline tf\u2013idf", 
      "internal clustering evaluation", 
      "intact concept embedding", 
      "Concept embedding-based weighting scheme", 
      "embedding-based weighting scheme"
    ], 
    "name": "Concept embedding-based weighting scheme for biomedical text clustering and visualization", 
    "pagination": "8", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1107951795"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s40535-018-0055-8"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s40535-018-0055-8", 
      "https://app.dimensions.ai/details/publication/pub.1107951795"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2021-12-01T19:43", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211201/entities/gbq_results/article/article_792.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/s40535-018-0055-8"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8'


 

This table displays all metadata directly associated to this object as RDF triples.

130 TRIPLES      22 PREDICATES      80 URIs      69 LITERALS      6 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s40535-018-0055-8 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author Nd95e42026f194ae59cac5ed86d2d68fd
4 schema:citation sg:pub.10.1007/978-3-540-71703-4_12
5 sg:pub.10.1007/978-3-642-15384-6_45
6 sg:pub.10.1186/s12911-017-0498-1
7 schema:datePublished 2018-11-01
8 schema:datePublishedReg 2018-11-01
9 schema:description Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf–idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies–Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure.
10 schema:genre article
11 schema:inLanguage en
12 schema:isAccessibleForFree true
13 schema:isPartOf N575234eab2364d35932a1eb3f4c4fa18
14 Ndd0cbbb0db3949698e760329ede01f4f
15 sg:journal.1053269
16 schema:keywords Biomedical text clustering
17 Clustering Evaluation
18 Concept embedding-based weighting scheme
19 Davies-Bouldin index
20 TF-IDF
21 association
22 baseline tf–idf
23 better document search
24 biomedical text collections
25 browsing
26 clinical text collections
27 clustering
28 clusters
29 collection
30 concept
31 concept associations
32 concept embeddings
33 different biomedical text collections
34 disease
35 document representation
36 document search
37 document vectors
38 embedding
39 embedding-based weighting scheme
40 evaluation
41 index
42 intact concept embedding
43 internal clustering evaluation
44 measures
45 mining techniques
46 more clusters
47 network
48 neural network
49 new weighting scheme
50 representation
51 research
52 results
53 retrieval
54 returned results
55 scheme
56 search
57 specific diseases
58 technique
59 text clustering
60 text collections
61 text mining techniques
62 types
63 use
64 vector
65 visualization
66 weighting scheme
67 word embeddings
68 schema:name Concept embedding-based weighting scheme for biomedical text clustering and visualization
69 schema:pagination 8
70 schema:productId N5076276099e34a09a61350ffaf65cea6
71 Na9293624748e4678b25aa19dc5b5d407
72 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107951795
73 https://doi.org/10.1186/s40535-018-0055-8
74 schema:sdDatePublished 2021-12-01T19:43
75 schema:sdLicense https://scigraph.springernature.com/explorer/license/
76 schema:sdPublisher Ndbefa2e56c4a446588b3ab85b0114f26
77 schema:url https://doi.org/10.1186/s40535-018-0055-8
78 sgo:license sg:explorer/license/
79 sgo:sdDataset articles
80 rdf:type schema:ScholarlyArticle
81 N4d870a286c734a619f00f06ebf950468 rdf:first sg:person.010375243714.54
82 rdf:rest rdf:nil
83 N5076276099e34a09a61350ffaf65cea6 schema:name dimensions_id
84 schema:value pub.1107951795
85 rdf:type schema:PropertyValue
86 N575234eab2364d35932a1eb3f4c4fa18 schema:issueNumber 1
87 rdf:type schema:PublicationIssue
88 Na9293624748e4678b25aa19dc5b5d407 schema:name doi
89 schema:value 10.1186/s40535-018-0055-8
90 rdf:type schema:PropertyValue
91 Nd95e42026f194ae59cac5ed86d2d68fd rdf:first sg:person.013273233436.84
92 rdf:rest N4d870a286c734a619f00f06ebf950468
93 Ndbefa2e56c4a446588b3ab85b0114f26 schema:name Springer Nature - SN SciGraph project
94 rdf:type schema:Organization
95 Ndd0cbbb0db3949698e760329ede01f4f schema:volumeNumber 5
96 rdf:type schema:PublicationVolume
97 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
98 schema:name Information and Computing Sciences
99 rdf:type schema:DefinedTerm
100 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
101 schema:name Information Systems
102 rdf:type schema:DefinedTerm
103 sg:journal.1053269 schema:issn 2196-0089
104 schema:name Applied Informatics
105 schema:publisher Springer Nature
106 rdf:type schema:Periodical
107 sg:person.010375243714.54 schema:affiliation grid-institutes:grid.257413.6
108 schema:familyName Shah
109 schema:givenName Setu
110 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010375243714.54
111 rdf:type schema:Person
112 sg:person.013273233436.84 schema:affiliation grid-institutes:grid.257413.6
113 schema:familyName Luo
114 schema:givenName Xiao
115 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013273233436.84
116 rdf:type schema:Person
117 sg:pub.10.1007/978-3-540-71703-4_12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012685859
118 https://doi.org/10.1007/978-3-540-71703-4_12
119 rdf:type schema:CreativeWork
120 sg:pub.10.1007/978-3-642-15384-6_45 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001564532
121 https://doi.org/10.1007/978-3-642-15384-6_45
122 rdf:type schema:CreativeWork
123 sg:pub.10.1186/s12911-017-0498-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1090274713
124 https://doi.org/10.1186/s12911-017-0498-1
125 rdf:type schema:CreativeWork
126 grid-institutes:grid.257413.6 schema:alternateName Department of Computer Information Technology, IUPUI, Indianapolis, USA
127 Department of Electrical and Computer Engineering, IUPUI, Indianapolis, USA
128 schema:name Department of Computer Information Technology, IUPUI, Indianapolis, USA
129 Department of Electrical and Computer Engineering, IUPUI, Indianapolis, USA
130 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...