Concept embedding-based weighting scheme for biomedical text clustering and visualization View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2018-12

AUTHORS

Xiao Luo, Setu Shah

ABSTRACT

Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf–idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies–Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure. More... »

PAGES

8

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8

DOI

http://dx.doi.org/10.1186/s40535-018-0055-8

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1107951795


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Indiana University \u2013 Purdue University Indianapolis", 
          "id": "https://www.grid.ac/institutes/grid.257413.6", 
          "name": [
            "Department of Computer Information Technology, IUPUI, Indianapolis, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Luo", 
        "givenName": "Xiao", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Indiana University \u2013 Purdue University Indianapolis", 
          "id": "https://www.grid.ac/institutes/grid.257413.6", 
          "name": [
            "Department of Electrical and Computer Engineering, IUPUI, Indianapolis, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Shah", 
        "givenName": "Setu", 
        "id": "sg:person.010375243714.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010375243714.54"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/978-3-642-15384-6_45", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001564532", 
          "https://doi.org/10.1007/978-3-642-15384-6_45"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-15384-6_45", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001564532", 
          "https://doi.org/10.1007/978-3-642-15384-6_45"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btp338", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002646819"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-71703-4_12", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012685859", 
          "https://doi.org/10.1007/978-3-540-71703-4_12"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/14786440109462720", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050450827"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tpami.1979.4766909", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061741629"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tsmcb.2012.2227998", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061797600"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.neucom.2017.05.046", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1085608695"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12911-017-0498-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1090274713", 
          "https://doi.org/10.1186/s12911-017-0498-1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s12911-017-0498-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1090274713", 
          "https://doi.org/10.1186/s12911-017-0498-1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/iccci.2013.6466273", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094165097"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.18653/v1/w16-2910", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1098653387"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.2307/2346830", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1101982469"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.2307/2346830", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1101982469"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/bhi.2018.8333440", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1103276166"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-12", 
    "datePublishedReg": "2018-12-01", 
    "description": "Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf\u2013idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies\u2013Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/s40535-018-0055-8", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1053269", 
        "issn": [
          "2196-0089"
        ], 
        "name": "Applied Informatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "5"
      }
    ], 
    "name": "Concept embedding-based weighting scheme for biomedical text clustering and visualization", 
    "pagination": "8", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "ee5fd06cc0fd0d1be361d10ea4d2f8d49a783df0c9763a184a0770a8e220660c"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s40535-018-0055-8"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1107951795"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s40535-018-0055-8", 
      "https://app.dimensions.ai/details/publication/pub.1107951795"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T16:01", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8664_00000574.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1186%2Fs40535-018-0055-8"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s40535-018-0055-8'


 

This table displays all metadata directly associated to this object as RDF triples.

106 TRIPLES      21 PREDICATES      39 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s40535-018-0055-8 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N031942ee977b45b7a9e10f75a2901067
4 schema:citation sg:pub.10.1007/978-3-540-71703-4_12
5 sg:pub.10.1007/978-3-642-15384-6_45
6 sg:pub.10.1186/s12911-017-0498-1
7 https://doi.org/10.1016/j.neucom.2017.05.046
8 https://doi.org/10.1080/14786440109462720
9 https://doi.org/10.1093/bioinformatics/btp338
10 https://doi.org/10.1109/bhi.2018.8333440
11 https://doi.org/10.1109/iccci.2013.6466273
12 https://doi.org/10.1109/tpami.1979.4766909
13 https://doi.org/10.1109/tsmcb.2012.2227998
14 https://doi.org/10.18653/v1/w16-2910
15 https://doi.org/10.2307/2346830
16 schema:datePublished 2018-12
17 schema:datePublishedReg 2018-12-01
18 schema:description Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf–idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies–Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure.
19 schema:genre research_article
20 schema:inLanguage en
21 schema:isAccessibleForFree false
22 schema:isPartOf N3fe720e307da471ba5f82a47332a61a1
23 N7649327d68004447a0d94a537ff10108
24 sg:journal.1053269
25 schema:name Concept embedding-based weighting scheme for biomedical text clustering and visualization
26 schema:pagination 8
27 schema:productId N04cedaf7a945479eae018b6567215322
28 Nd25efd598a754f15aa4ce841e10070f4
29 Ndb8ff320206c4e1e8ece275b3866e463
30 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107951795
31 https://doi.org/10.1186/s40535-018-0055-8
32 schema:sdDatePublished 2019-04-10T16:01
33 schema:sdLicense https://scigraph.springernature.com/explorer/license/
34 schema:sdPublisher N6f245ba440764d71b2dce8963d78b86e
35 schema:url https://link.springer.com/10.1186%2Fs40535-018-0055-8
36 sgo:license sg:explorer/license/
37 sgo:sdDataset articles
38 rdf:type schema:ScholarlyArticle
39 N031942ee977b45b7a9e10f75a2901067 rdf:first N15a8fd2ccfa94c3cade1872319b7fa2d
40 rdf:rest N2bdab30055a046f7bb15b1631426521a
41 N04cedaf7a945479eae018b6567215322 schema:name doi
42 schema:value 10.1186/s40535-018-0055-8
43 rdf:type schema:PropertyValue
44 N15a8fd2ccfa94c3cade1872319b7fa2d schema:affiliation https://www.grid.ac/institutes/grid.257413.6
45 schema:familyName Luo
46 schema:givenName Xiao
47 rdf:type schema:Person
48 N2bdab30055a046f7bb15b1631426521a rdf:first sg:person.010375243714.54
49 rdf:rest rdf:nil
50 N3fe720e307da471ba5f82a47332a61a1 schema:issueNumber 1
51 rdf:type schema:PublicationIssue
52 N6f245ba440764d71b2dce8963d78b86e schema:name Springer Nature - SN SciGraph project
53 rdf:type schema:Organization
54 N7649327d68004447a0d94a537ff10108 schema:volumeNumber 5
55 rdf:type schema:PublicationVolume
56 Nd25efd598a754f15aa4ce841e10070f4 schema:name dimensions_id
57 schema:value pub.1107951795
58 rdf:type schema:PropertyValue
59 Ndb8ff320206c4e1e8ece275b3866e463 schema:name readcube_id
60 schema:value ee5fd06cc0fd0d1be361d10ea4d2f8d49a783df0c9763a184a0770a8e220660c
61 rdf:type schema:PropertyValue
62 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
63 schema:name Information and Computing Sciences
64 rdf:type schema:DefinedTerm
65 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
66 schema:name Information Systems
67 rdf:type schema:DefinedTerm
68 sg:journal.1053269 schema:issn 2196-0089
69 schema:name Applied Informatics
70 rdf:type schema:Periodical
71 sg:person.010375243714.54 schema:affiliation https://www.grid.ac/institutes/grid.257413.6
72 schema:familyName Shah
73 schema:givenName Setu
74 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010375243714.54
75 rdf:type schema:Person
76 sg:pub.10.1007/978-3-540-71703-4_12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012685859
77 https://doi.org/10.1007/978-3-540-71703-4_12
78 rdf:type schema:CreativeWork
79 sg:pub.10.1007/978-3-642-15384-6_45 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001564532
80 https://doi.org/10.1007/978-3-642-15384-6_45
81 rdf:type schema:CreativeWork
82 sg:pub.10.1186/s12911-017-0498-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1090274713
83 https://doi.org/10.1186/s12911-017-0498-1
84 rdf:type schema:CreativeWork
85 https://doi.org/10.1016/j.neucom.2017.05.046 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085608695
86 rdf:type schema:CreativeWork
87 https://doi.org/10.1080/14786440109462720 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050450827
88 rdf:type schema:CreativeWork
89 https://doi.org/10.1093/bioinformatics/btp338 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002646819
90 rdf:type schema:CreativeWork
91 https://doi.org/10.1109/bhi.2018.8333440 schema:sameAs https://app.dimensions.ai/details/publication/pub.1103276166
92 rdf:type schema:CreativeWork
93 https://doi.org/10.1109/iccci.2013.6466273 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094165097
94 rdf:type schema:CreativeWork
95 https://doi.org/10.1109/tpami.1979.4766909 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061741629
96 rdf:type schema:CreativeWork
97 https://doi.org/10.1109/tsmcb.2012.2227998 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061797600
98 rdf:type schema:CreativeWork
99 https://doi.org/10.18653/v1/w16-2910 schema:sameAs https://app.dimensions.ai/details/publication/pub.1098653387
100 rdf:type schema:CreativeWork
101 https://doi.org/10.2307/2346830 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101982469
102 rdf:type schema:CreativeWork
103 https://www.grid.ac/institutes/grid.257413.6 schema:alternateName Indiana University – Purdue University Indianapolis
104 schema:name Department of Computer Information Technology, IUPUI, Indianapolis, USA
105 Department of Electrical and Computer Engineering, IUPUI, Indianapolis, USA
106 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...