Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2017-12

AUTHORS

Wei Hu, Amrapali Zaveri, Honglei Qiu, Michel Dumontier

ABSTRACT

BACKGROUND: The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. METHODS: In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. RESULTS: Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). CONCLUSION: Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types. More... »

PAGES

415

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s12859-017-1832-4

DOI

http://dx.doi.org/10.1186/s12859-017-1832-4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1091834589

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/28923003


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Cluster Analysis", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Data Accuracy", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metadata", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Nanjing University", 
          "id": "https://www.grid.ac/institutes/grid.41156.37", 
          "name": [
            "State Key Laboratory for Novel Software Technology, Nanjing University, 163 Xianlin Avenue, 210023, Nanjing, Jiangsu, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hu", 
        "givenName": "Wei", 
        "id": "sg:person.012610231116.60", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012610231116.60"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Maastricht University", 
          "id": "https://www.grid.ac/institutes/grid.5012.6", 
          "name": [
            "Institute of Data Science, Maastricht University, 6200, Maastricht, MD, The Netherlands"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zaveri", 
        "givenName": "Amrapali", 
        "id": "sg:person.01236003355.43", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01236003355.43"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Nanjing University", 
          "id": "https://www.grid.ac/institutes/grid.41156.37", 
          "name": [
            "State Key Laboratory for Novel Software Technology, Nanjing University, 163 Xianlin Avenue, 210023, Nanjing, Jiangsu, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Qiu", 
        "givenName": "Honglei", 
        "id": "sg:person.014015352303.46", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014015352303.46"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Maastricht University", 
          "id": "https://www.grid.ac/institutes/grid.5012.6", 
          "name": [
            "Institute of Data Science, Maastricht University, 6200, Maastricht, MD, The Netherlands"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Dumontier", 
        "givenName": "Michel", 
        "id": "sg:person.01324655201.14", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01324655201.14"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/ng1201-365", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003793347", 
          "https://doi.org/10.1038/ng1201-365"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng1201-365", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003793347", 
          "https://doi.org/10.1038/ng1201-365"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.datak.2008.06.003", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007527724"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-41338-4_19", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009601566", 
          "https://doi.org/10.1007/978-3-642-41338-4_19"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/asi.22634", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012497413"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.95.25.14863", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020882317"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btr406", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028570070"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-12-436", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033062809", 
          "https://doi.org/10.1186/1471-2105-12-436"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-74987-5_1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033740376", 
          "https://doi.org/10.1007/978-3-540-74987-5_1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-74987-5_1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033740376", 
          "https://doi.org/10.1007/978-3-540-74987-5_1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gks1193", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035551539"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1126/scitranslmed.aaa5993", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035837481"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-10-234", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037547351", 
          "https://doi.org/10.1186/1471-2105-10-234"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-10-234", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037547351", 
          "https://doi.org/10.1186/1471-2105-10-234"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkq848", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045535755"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkq1184", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045648206"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icde.1999.754967", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095425568"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/v1/d14-1082", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099110754"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/v1/d14-1082", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099110754"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2017-12", 
    "datePublishedReg": "2017-12-01", 
    "description": "BACKGROUND: The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data.\nMETHODS: In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together.\nRESULTS: Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63).\nCONCLUSION: Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/s12859-017-1832-4", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "18"
      }
    ], 
    "name": "Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata", 
    "pagination": "415", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "34c548b18b758df87ccb6e79dfe3bc4a0143351155fdfc8d80edc684e21c9310"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "28923003"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s12859-017-1832-4"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1091834589"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s12859-017-1832-4", 
      "https://app.dimensions.ai/details/publication/pub.1091834589"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T18:23", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8675_00000528.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186%2Fs12859-017-1832-4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12859-017-1832-4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12859-017-1832-4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12859-017-1832-4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12859-017-1832-4'


 

This table displays all metadata directly associated to this object as RDF triples.

158 TRIPLES      21 PREDICATES      48 URIs      25 LITERALS      13 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s12859-017-1832-4 schema:about N09fe7b3befd14a8183d258e43837f156
2 N3696e880bcfc4a74ab19540e4b90302e
3 Na500dc6512c9440f82f2bc56e7e30037
4 Nba30e0b873b14f5f96167e7b430f9284
5 anzsrc-for:08
6 anzsrc-for:0806
7 schema:author Nbe9b71282b104bacbe70cbee027db8af
8 schema:citation sg:pub.10.1007/978-3-540-74987-5_1
9 sg:pub.10.1007/978-3-642-41338-4_19
10 sg:pub.10.1038/ng1201-365
11 sg:pub.10.1186/1471-2105-10-234
12 sg:pub.10.1186/1471-2105-12-436
13 https://doi.org/10.1002/asi.22634
14 https://doi.org/10.1016/j.datak.2008.06.003
15 https://doi.org/10.1073/pnas.95.25.14863
16 https://doi.org/10.1093/bioinformatics/btr406
17 https://doi.org/10.1093/nar/gkq1184
18 https://doi.org/10.1093/nar/gkq848
19 https://doi.org/10.1093/nar/gks1193
20 https://doi.org/10.1109/icde.1999.754967
21 https://doi.org/10.1126/scitranslmed.aaa5993
22 https://doi.org/10.3115/v1/d14-1082
23 schema:datePublished 2017-12
24 schema:datePublishedReg 2017-12-01
25 schema:description BACKGROUND: The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. METHODS: In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. RESULTS: Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). CONCLUSION: Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.
26 schema:genre research_article
27 schema:inLanguage en
28 schema:isAccessibleForFree true
29 schema:isPartOf N0d6621cfd00149d4b14ffff4f075e698
30 Nf1ac4dce711e4dee874cb52e1ec723bd
31 sg:journal.1023786
32 schema:name Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata
33 schema:pagination 415
34 schema:productId N23f8f303105b4f0b9ef6e00013db31ac
35 N266cd2177c624c6496a074bd9b7566dd
36 N6aef960cc43247be891e1d6ebd1f516b
37 Nd1b0c7bd49c744dd999a6e4b086e99de
38 Ndf243086d61e4ff2abb2752798c91a32
39 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091834589
40 https://doi.org/10.1186/s12859-017-1832-4
41 schema:sdDatePublished 2019-04-10T18:23
42 schema:sdLicense https://scigraph.springernature.com/explorer/license/
43 schema:sdPublisher Nfe35e00a2ad94e3fb49bc56fb983f074
44 schema:url http://link.springer.com/10.1186%2Fs12859-017-1832-4
45 sgo:license sg:explorer/license/
46 sgo:sdDataset articles
47 rdf:type schema:ScholarlyArticle
48 N09fe7b3befd14a8183d258e43837f156 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
49 schema:name Algorithms
50 rdf:type schema:DefinedTerm
51 N0d6621cfd00149d4b14ffff4f075e698 schema:volumeNumber 18
52 rdf:type schema:PublicationVolume
53 N2143aeda94f74a0e8056abb04c788c55 rdf:first sg:person.01324655201.14
54 rdf:rest rdf:nil
55 N23f8f303105b4f0b9ef6e00013db31ac schema:name nlm_unique_id
56 schema:value 100965194
57 rdf:type schema:PropertyValue
58 N266cd2177c624c6496a074bd9b7566dd schema:name readcube_id
59 schema:value 34c548b18b758df87ccb6e79dfe3bc4a0143351155fdfc8d80edc684e21c9310
60 rdf:type schema:PropertyValue
61 N3696e880bcfc4a74ab19540e4b90302e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
62 schema:name Data Accuracy
63 rdf:type schema:DefinedTerm
64 N4de17936f73e4c77a79d149c3be67a3a rdf:first sg:person.014015352303.46
65 rdf:rest N2143aeda94f74a0e8056abb04c788c55
66 N619c767bb6b047b181e7d0e8cc6fd66f rdf:first sg:person.01236003355.43
67 rdf:rest N4de17936f73e4c77a79d149c3be67a3a
68 N6aef960cc43247be891e1d6ebd1f516b schema:name dimensions_id
69 schema:value pub.1091834589
70 rdf:type schema:PropertyValue
71 Na500dc6512c9440f82f2bc56e7e30037 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
72 schema:name Metadata
73 rdf:type schema:DefinedTerm
74 Nba30e0b873b14f5f96167e7b430f9284 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
75 schema:name Cluster Analysis
76 rdf:type schema:DefinedTerm
77 Nbe9b71282b104bacbe70cbee027db8af rdf:first sg:person.012610231116.60
78 rdf:rest N619c767bb6b047b181e7d0e8cc6fd66f
79 Nd1b0c7bd49c744dd999a6e4b086e99de schema:name doi
80 schema:value 10.1186/s12859-017-1832-4
81 rdf:type schema:PropertyValue
82 Ndf243086d61e4ff2abb2752798c91a32 schema:name pubmed_id
83 schema:value 28923003
84 rdf:type schema:PropertyValue
85 Nf1ac4dce711e4dee874cb52e1ec723bd schema:issueNumber 1
86 rdf:type schema:PublicationIssue
87 Nfe35e00a2ad94e3fb49bc56fb983f074 schema:name Springer Nature - SN SciGraph project
88 rdf:type schema:Organization
89 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
90 schema:name Information and Computing Sciences
91 rdf:type schema:DefinedTerm
92 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
93 schema:name Information Systems
94 rdf:type schema:DefinedTerm
95 sg:journal.1023786 schema:issn 1471-2105
96 schema:name BMC Bioinformatics
97 rdf:type schema:Periodical
98 sg:person.01236003355.43 schema:affiliation https://www.grid.ac/institutes/grid.5012.6
99 schema:familyName Zaveri
100 schema:givenName Amrapali
101 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01236003355.43
102 rdf:type schema:Person
103 sg:person.012610231116.60 schema:affiliation https://www.grid.ac/institutes/grid.41156.37
104 schema:familyName Hu
105 schema:givenName Wei
106 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012610231116.60
107 rdf:type schema:Person
108 sg:person.01324655201.14 schema:affiliation https://www.grid.ac/institutes/grid.5012.6
109 schema:familyName Dumontier
110 schema:givenName Michel
111 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01324655201.14
112 rdf:type schema:Person
113 sg:person.014015352303.46 schema:affiliation https://www.grid.ac/institutes/grid.41156.37
114 schema:familyName Qiu
115 schema:givenName Honglei
116 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014015352303.46
117 rdf:type schema:Person
118 sg:pub.10.1007/978-3-540-74987-5_1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033740376
119 https://doi.org/10.1007/978-3-540-74987-5_1
120 rdf:type schema:CreativeWork
121 sg:pub.10.1007/978-3-642-41338-4_19 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009601566
122 https://doi.org/10.1007/978-3-642-41338-4_19
123 rdf:type schema:CreativeWork
124 sg:pub.10.1038/ng1201-365 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003793347
125 https://doi.org/10.1038/ng1201-365
126 rdf:type schema:CreativeWork
127 sg:pub.10.1186/1471-2105-10-234 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037547351
128 https://doi.org/10.1186/1471-2105-10-234
129 rdf:type schema:CreativeWork
130 sg:pub.10.1186/1471-2105-12-436 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033062809
131 https://doi.org/10.1186/1471-2105-12-436
132 rdf:type schema:CreativeWork
133 https://doi.org/10.1002/asi.22634 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012497413
134 rdf:type schema:CreativeWork
135 https://doi.org/10.1016/j.datak.2008.06.003 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007527724
136 rdf:type schema:CreativeWork
137 https://doi.org/10.1073/pnas.95.25.14863 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020882317
138 rdf:type schema:CreativeWork
139 https://doi.org/10.1093/bioinformatics/btr406 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028570070
140 rdf:type schema:CreativeWork
141 https://doi.org/10.1093/nar/gkq1184 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045648206
142 rdf:type schema:CreativeWork
143 https://doi.org/10.1093/nar/gkq848 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045535755
144 rdf:type schema:CreativeWork
145 https://doi.org/10.1093/nar/gks1193 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035551539
146 rdf:type schema:CreativeWork
147 https://doi.org/10.1109/icde.1999.754967 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095425568
148 rdf:type schema:CreativeWork
149 https://doi.org/10.1126/scitranslmed.aaa5993 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035837481
150 rdf:type schema:CreativeWork
151 https://doi.org/10.3115/v1/d14-1082 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099110754
152 rdf:type schema:CreativeWork
153 https://www.grid.ac/institutes/grid.41156.37 schema:alternateName Nanjing University
154 schema:name State Key Laboratory for Novel Software Technology, Nanjing University, 163 Xianlin Avenue, 210023, Nanjing, Jiangsu, China
155 rdf:type schema:Organization
156 https://www.grid.ac/institutes/grid.5012.6 schema:alternateName Maastricht University
157 schema:name Institute of Data Science, Maastricht University, 6200, Maastricht, MD, The Netherlands
158 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...