Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrating Ontology Modelling and Natural Language ... View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2014

AUTHORS

Alexey Cheptsov , Axel Tenschert , Paul Schmidt , Birte Glimm , Mauricio Matthesius , Thorsten Liebig

ABSTRACT

A good deal of digital data produced in academia, commerce and industry is made up of a raw, unstructured text, such as Word documents, Excel tables, emails, web pages, etc., which are also often represented in a natural language. An important analytical task in a number of scientific and technological domains is to retrieve information from text data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major challenge is the size, structural complexity, and frequency of the analysed text sets’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques and tools impossible. We introduce an innovative approach to analyse unstructured text data. This allows for improving traditional data mining techniques by adopting algorithms from ontological domain modelling, natural language processing, and machine learning. The technique is inherently designed with parallelism in mind, which allows for high performance on large-scale Cloud computing infrastructures. More... »

PAGES

62-74

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-54370-8_6

DOI

http://dx.doi.org/10.1007/978-3-642-54370-8_6

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1001886785


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "High-Performance Computing Center Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "High-Performance Computing Center Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cheptsov", 
        "givenName": "Alexey", 
        "id": "sg:person.010137572622.04", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010137572622.04"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "High-Performance Computing Center Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "High-Performance Computing Center Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tenschert", 
        "givenName": "Axel", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institute of the Society for the Promotion of Applied Information Sciences, Saarland University, Martin-Luther-Str. 14, 66111, Saarbr\u00fccken, Germany", 
          "id": "http://www.grid.ac/institutes/grid.11749.3a", 
          "name": [
            "Institute of the Society for the Promotion of Applied Information Sciences, Saarland University, Martin-Luther-Str. 14, 66111, Saarbr\u00fccken, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Schmidt", 
        "givenName": "Paul", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institute of Artificial Intelligence, University of Ulm, 89069, Ulm, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6582.9", 
          "name": [
            "Institute of Artificial Intelligence, University of Ulm, 89069, Ulm, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Glimm", 
        "givenName": "Birte", 
        "id": "sg:person.015234565343.35", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015234565343.35"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Objectivity, Inc., 3099 North First Street, Suite 200, 95134, San Jose, CA, USA", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Objectivity, Inc., 3099 North First Street, Suite 200, 95134, San Jose, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Matthesius", 
        "givenName": "Mauricio", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "derivo GmbH, James-Franck-Ring, 89081, Ulm, Germany", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "derivo GmbH, James-Franck-Ring, 89081, Ulm, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Liebig", 
        "givenName": "Thorsten", 
        "id": "sg:person.014437204743.19", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014437204743.19"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2014", 
    "datePublishedReg": "2014-01-01", 
    "description": "A good deal of digital data produced in academia, commerce and industry is made up of a raw, unstructured text, such as Word documents, Excel tables, emails, web pages, etc., which are also often represented in a natural language. An important analytical task in a number of scientific and technological domains is to retrieve information from text data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major challenge is the size, structural complexity, and frequency of the analysed text sets\u2019 updates (i.e., the \u2018big data\u2019 aspect), which makes the use of traditional analysis techniques and tools impossible. We introduce an innovative approach to analyse unstructured text data. This allows for improving traditional data mining techniques by adopting algorithms from ontological domain modelling, natural language processing, and machine learning. The technique is inherently designed with parallelism in mind, which allows for high performance on large-scale Cloud computing infrastructures.", 
    "editor": [
      {
        "familyName": "Huang", 
        "givenName": "Zhisheng", 
        "type": "Person"
      }, 
      {
        "familyName": "Liu", 
        "givenName": "Chengfei", 
        "type": "Person"
      }, 
      {
        "familyName": "He", 
        "givenName": "Jing", 
        "type": "Person"
      }, 
      {
        "familyName": "Huang", 
        "givenName": "Guangyan", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-54370-8_6", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-642-54369-2", 
        "978-3-642-54370-8"
      ], 
      "name": "Web Information Systems Engineering \u2013 WISE 2013 Workshops", 
      "type": "Book"
    }, 
    "keywords": [
      "natural language processing", 
      "mining techniques", 
      "text data", 
      "language processing", 
      "traditional data mining techniques", 
      "traditional text mining techniques", 
      "cloud computing infrastructures", 
      "service cloud platform", 
      "unstructured text data", 
      "data mining techniques", 
      "text mining techniques", 
      "computing infrastructures", 
      "cloud platform", 
      "unstructured text", 
      "ontology modelling", 
      "web pages", 
      "scalable data", 
      "machine learning", 
      "traditional analysis techniques", 
      "natural language", 
      "Word documents", 
      "digital data", 
      "particular domain", 
      "text sets", 
      "domain modelling", 
      "analytical tasks", 
      "Excel tables", 
      "high performance", 
      "technological domains", 
      "important analytical task", 
      "analysis techniques", 
      "major challenge", 
      "processing", 
      "parallelism", 
      "commerce", 
      "algorithm", 
      "infrastructure", 
      "innovative approach", 
      "pages", 
      "technique", 
      "platform", 
      "email", 
      "task", 
      "learning", 
      "complexity", 
      "documents", 
      "language", 
      "domain", 
      "data", 
      "modelling", 
      "academia", 
      "information", 
      "set", 
      "update", 
      "text", 
      "tool", 
      "deeper insight", 
      "performance", 
      "table", 
      "structural complexity", 
      "challenges", 
      "knowledge", 
      "industry", 
      "good deal", 
      "deal", 
      "order", 
      "interest", 
      "number", 
      "use", 
      "fact", 
      "mind", 
      "content", 
      "insights", 
      "size", 
      "frequency", 
      "approach"
    ], 
    "name": "Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrating Ontology Modelling and Natural Language Processing", 
    "pagination": "62-74", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1001886785"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-54370-8_6"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-54370-8_6", 
      "https://app.dimensions.ai/details/publication/pub.1001886785"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-06-01T22:31", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220601/entities/gbq_results/chapter/chapter_292.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-54370-8_6"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-54370-8_6'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-54370-8_6'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-54370-8_6'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-54370-8_6'


 

This table displays all metadata directly associated to this object as RDF triples.

197 TRIPLES      23 PREDICATES      103 URIs      95 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-54370-8_6 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 anzsrc-for:0806
4 schema:author Nda4db0c9e36c4d05976dad932eedeb4e
5 schema:datePublished 2014
6 schema:datePublishedReg 2014-01-01
7 schema:description A good deal of digital data produced in academia, commerce and industry is made up of a raw, unstructured text, such as Word documents, Excel tables, emails, web pages, etc., which are also often represented in a natural language. An important analytical task in a number of scientific and technological domains is to retrieve information from text data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major challenge is the size, structural complexity, and frequency of the analysed text sets’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques and tools impossible. We introduce an innovative approach to analyse unstructured text data. This allows for improving traditional data mining techniques by adopting algorithms from ontological domain modelling, natural language processing, and machine learning. The technique is inherently designed with parallelism in mind, which allows for high performance on large-scale Cloud computing infrastructures.
8 schema:editor N7bf4bdc11ef14466a2fda3334d88f051
9 schema:genre chapter
10 schema:inLanguage en
11 schema:isAccessibleForFree false
12 schema:isPartOf N481f2ac6604c430788e3c201662121d6
13 schema:keywords Excel tables
14 Word documents
15 academia
16 algorithm
17 analysis techniques
18 analytical tasks
19 approach
20 challenges
21 cloud computing infrastructures
22 cloud platform
23 commerce
24 complexity
25 computing infrastructures
26 content
27 data
28 data mining techniques
29 deal
30 deeper insight
31 digital data
32 documents
33 domain
34 domain modelling
35 email
36 fact
37 frequency
38 good deal
39 high performance
40 important analytical task
41 industry
42 information
43 infrastructure
44 innovative approach
45 insights
46 interest
47 knowledge
48 language
49 language processing
50 learning
51 machine learning
52 major challenge
53 mind
54 mining techniques
55 modelling
56 natural language
57 natural language processing
58 number
59 ontology modelling
60 order
61 pages
62 parallelism
63 particular domain
64 performance
65 platform
66 processing
67 scalable data
68 service cloud platform
69 set
70 size
71 structural complexity
72 table
73 task
74 technique
75 technological domains
76 text
77 text data
78 text mining techniques
79 text sets
80 tool
81 traditional analysis techniques
82 traditional data mining techniques
83 traditional text mining techniques
84 unstructured text
85 unstructured text data
86 update
87 use
88 web pages
89 schema:name Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrating Ontology Modelling and Natural Language Processing
90 schema:pagination 62-74
91 schema:productId N2bdc267580c742bda741a345224552c4
92 Nea4ec086184649dab48fe30a6b42db4a
93 schema:publisher N467b2f6f18244617869a6db27aa18189
94 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001886785
95 https://doi.org/10.1007/978-3-642-54370-8_6
96 schema:sdDatePublished 2022-06-01T22:31
97 schema:sdLicense https://scigraph.springernature.com/explorer/license/
98 schema:sdPublisher N2cf6167d1ed24d72b636c0e240cbc5cc
99 schema:url https://doi.org/10.1007/978-3-642-54370-8_6
100 sgo:license sg:explorer/license/
101 sgo:sdDataset chapters
102 rdf:type schema:Chapter
103 N07c83a5a06a9432588809f7dafbefc28 schema:familyName Huang
104 schema:givenName Guangyan
105 rdf:type schema:Person
106 N16161394677c4536bddaf5d084a7fa03 schema:familyName Liu
107 schema:givenName Chengfei
108 rdf:type schema:Person
109 N22e6b55d1a8d458f9adc3ce51a364e7e schema:familyName He
110 schema:givenName Jing
111 rdf:type schema:Person
112 N2a0ceecdc4b24b6f9dd984dbe9167167 rdf:first sg:person.015234565343.35
113 rdf:rest N91dc5495e0bb4aa288413102a548815b
114 N2bdc267580c742bda741a345224552c4 schema:name doi
115 schema:value 10.1007/978-3-642-54370-8_6
116 rdf:type schema:PropertyValue
117 N2c05e72d50554fa0977f03320e2a3e23 schema:affiliation grid-institutes:None
118 schema:familyName Tenschert
119 schema:givenName Axel
120 rdf:type schema:Person
121 N2cf6167d1ed24d72b636c0e240cbc5cc schema:name Springer Nature - SN SciGraph project
122 rdf:type schema:Organization
123 N3a853fa9cacd4856b3645356f4f5567c rdf:first N22e6b55d1a8d458f9adc3ce51a364e7e
124 rdf:rest Ne48f93e732f24037a6e3d32507f59bfe
125 N467b2f6f18244617869a6db27aa18189 schema:name Springer Nature
126 rdf:type schema:Organisation
127 N481f2ac6604c430788e3c201662121d6 schema:isbn 978-3-642-54369-2
128 978-3-642-54370-8
129 schema:name Web Information Systems Engineering – WISE 2013 Workshops
130 rdf:type schema:Book
131 N7bf4bdc11ef14466a2fda3334d88f051 rdf:first N92fa07f2af254279b6aadf78cf22e363
132 rdf:rest Nce26721eb99e4a6097f590b15df4b59b
133 N91dc5495e0bb4aa288413102a548815b rdf:first Nc4518be2e2d54b21bbafd8673b55aa26
134 rdf:rest Ncbbfe2a67a4d44ad832c5905c316ce91
135 N92fa07f2af254279b6aadf78cf22e363 schema:familyName Huang
136 schema:givenName Zhisheng
137 rdf:type schema:Person
138 Na890d94d076641c38ab135ae4b78f2ca rdf:first N2c05e72d50554fa0977f03320e2a3e23
139 rdf:rest Nd9e98d2887cb4ba39f346ca7430db64b
140 Nbf13a8a89cc742b193db1e8310992451 schema:affiliation grid-institutes:grid.11749.3a
141 schema:familyName Schmidt
142 schema:givenName Paul
143 rdf:type schema:Person
144 Nc4518be2e2d54b21bbafd8673b55aa26 schema:affiliation grid-institutes:None
145 schema:familyName Matthesius
146 schema:givenName Mauricio
147 rdf:type schema:Person
148 Ncbbfe2a67a4d44ad832c5905c316ce91 rdf:first sg:person.014437204743.19
149 rdf:rest rdf:nil
150 Nce26721eb99e4a6097f590b15df4b59b rdf:first N16161394677c4536bddaf5d084a7fa03
151 rdf:rest N3a853fa9cacd4856b3645356f4f5567c
152 Nd9e98d2887cb4ba39f346ca7430db64b rdf:first Nbf13a8a89cc742b193db1e8310992451
153 rdf:rest N2a0ceecdc4b24b6f9dd984dbe9167167
154 Nda4db0c9e36c4d05976dad932eedeb4e rdf:first sg:person.010137572622.04
155 rdf:rest Na890d94d076641c38ab135ae4b78f2ca
156 Ne48f93e732f24037a6e3d32507f59bfe rdf:first N07c83a5a06a9432588809f7dafbefc28
157 rdf:rest rdf:nil
158 Nea4ec086184649dab48fe30a6b42db4a schema:name dimensions_id
159 schema:value pub.1001886785
160 rdf:type schema:PropertyValue
161 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
162 schema:name Information and Computing Sciences
163 rdf:type schema:DefinedTerm
164 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
165 schema:name Artificial Intelligence and Image Processing
166 rdf:type schema:DefinedTerm
167 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
168 schema:name Information Systems
169 rdf:type schema:DefinedTerm
170 sg:person.010137572622.04 schema:affiliation grid-institutes:None
171 schema:familyName Cheptsov
172 schema:givenName Alexey
173 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010137572622.04
174 rdf:type schema:Person
175 sg:person.014437204743.19 schema:affiliation grid-institutes:None
176 schema:familyName Liebig
177 schema:givenName Thorsten
178 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014437204743.19
179 rdf:type schema:Person
180 sg:person.015234565343.35 schema:affiliation grid-institutes:grid.6582.9
181 schema:familyName Glimm
182 schema:givenName Birte
183 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015234565343.35
184 rdf:type schema:Person
185 grid-institutes:None schema:alternateName High-Performance Computing Center Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
186 Objectivity, Inc., 3099 North First Street, Suite 200, 95134, San Jose, CA, USA
187 derivo GmbH, James-Franck-Ring, 89081, Ulm, Germany
188 schema:name High-Performance Computing Center Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
189 Objectivity, Inc., 3099 North First Street, Suite 200, 95134, San Jose, CA, USA
190 derivo GmbH, James-Franck-Ring, 89081, Ulm, Germany
191 rdf:type schema:Organization
192 grid-institutes:grid.11749.3a schema:alternateName Institute of the Society for the Promotion of Applied Information Sciences, Saarland University, Martin-Luther-Str. 14, 66111, Saarbrücken, Germany
193 schema:name Institute of the Society for the Promotion of Applied Information Sciences, Saarland University, Martin-Luther-Str. 14, 66111, Saarbrücken, Germany
194 rdf:type schema:Organization
195 grid-institutes:grid.6582.9 schema:alternateName Institute of Artificial Intelligence, University of Ulm, 89069, Ulm, Germany
196 schema:name Institute of Artificial Intelligence, University of Ulm, 89069, Ulm, Germany
197 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...