Ontology type: schema:Chapter Open Access: True
2013
AUTHORSJustin Christopher Martineau , Doreen Cheng , Tim Finin
ABSTRACTTextual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas. More... »
PAGES555-570
Machine Learning and Data Mining in Pattern Recognition
ISBN
978-3-642-39711-0
978-3-642-39712-7
http://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43
DOIhttp://dx.doi.org/10.1007/978-3-642-39712-7_43
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1011572208
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Artificial Intelligence and Image Processing",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Samsung Information Systems North America, USA",
"id": "http://www.grid.ac/institutes/None",
"name": [
"Samsung Information Systems North America, USA"
],
"type": "Organization"
},
"familyName": "Martineau",
"givenName": "Justin Christopher",
"id": "sg:person.013017730117.89",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013017730117.89"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Samsung Information Systems North America, USA",
"id": "http://www.grid.ac/institutes/None",
"name": [
"Samsung Information Systems North America, USA"
],
"type": "Organization"
},
"familyName": "Cheng",
"givenName": "Doreen",
"id": "sg:person.010560124331.39",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010560124331.39"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "University of Maryland Baltimore County, USA",
"id": "http://www.grid.ac/institutes/grid.266673.0",
"name": [
"University of Maryland Baltimore County, USA"
],
"type": "Organization"
},
"familyName": "Finin",
"givenName": "Tim",
"id": "sg:person.016274302751.69",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016274302751.69"
],
"type": "Person"
}
],
"datePublished": "2013",
"datePublishedReg": "2013-01-01",
"description": "Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.",
"editor": [
{
"familyName": "Perner",
"givenName": "Petra",
"type": "Person"
}
],
"genre": "chapter",
"id": "sg:pub.10.1007/978-3-642-39712-7_43",
"inLanguage": "en",
"isAccessibleForFree": true,
"isPartOf": {
"isbn": [
"978-3-642-39711-0",
"978-3-642-39712-7"
],
"name": "Machine Learning and Data Mining in Pattern Recognition",
"type": "Book"
},
"keywords": [
"business intelligence tools",
"domain-independent solutions",
"recommender systems",
"machine learning",
"intelligence tools",
"domain adaptation",
"personal assistants",
"Amazon products",
"strong assurance",
"words model",
"sentiment model",
"sentiment analysis",
"human intervention",
"domain model",
"average accuracy",
"scoring algorithm",
"low accuracy",
"adaptation speed",
"maintenance costs",
"algorithm",
"multi-domain approach",
"new topic areas",
"topic areas",
"unpredictable array",
"high implementation",
"high demand",
"accuracy",
"dataset",
"applications",
"learning",
"implementation",
"documents",
"assistants",
"model",
"assurance",
"tool",
"bags",
"cost",
"wide range",
"system",
"domain",
"speed",
"demand",
"words",
"solution",
"need",
"area",
"textual analysis",
"adaptation",
"point",
"analysis",
"array",
"percentage points",
"products",
"range",
"variance",
"baseline",
"changes",
"intervention",
"approach"
],
"name": "TISA: Topic Independence Scoring Algorithm",
"pagination": "555-570",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1011572208"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/978-3-642-39712-7_43"
]
}
],
"publisher": {
"name": "Springer Nature",
"type": "Organisation"
},
"sameAs": [
"https://doi.org/10.1007/978-3-642-39712-7_43",
"https://app.dimensions.ai/details/publication/pub.1011572208"
],
"sdDataset": "chapters",
"sdDatePublished": "2022-05-10T10:54",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220509/entities/gbq_results/chapter/chapter_462.jsonl",
"type": "Chapter",
"url": "https://doi.org/10.1007/978-3-642-39712-7_43"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'
This table displays all metadata directly associated to this object as RDF triples.
137 TRIPLES
23 PREDICATES
86 URIs
79 LITERALS
7 BLANK NODES
Subject | Predicate | Object | |
---|---|---|---|
1 | sg:pub.10.1007/978-3-642-39712-7_43 | schema:about | anzsrc-for:08 |
2 | ″ | ″ | anzsrc-for:0801 |
3 | ″ | schema:author | Nde746e3809e94ea0ae04f3b945fc8e3f |
4 | ″ | schema:datePublished | 2013 |
5 | ″ | schema:datePublishedReg | 2013-01-01 |
6 | ″ | schema:description | Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas. |
7 | ″ | schema:editor | N59ab6fbe79214ed386c751f69085a452 |
8 | ″ | schema:genre | chapter |
9 | ″ | schema:inLanguage | en |
10 | ″ | schema:isAccessibleForFree | true |
11 | ″ | schema:isPartOf | N9bef8f4c973e4ade8490ffeb1dc90c6a |
12 | ″ | schema:keywords | Amazon products |
13 | ″ | ″ | accuracy |
14 | ″ | ″ | adaptation |
15 | ″ | ″ | adaptation speed |
16 | ″ | ″ | algorithm |
17 | ″ | ″ | analysis |
18 | ″ | ″ | applications |
19 | ″ | ″ | approach |
20 | ″ | ″ | area |
21 | ″ | ″ | array |
22 | ″ | ″ | assistants |
23 | ″ | ″ | assurance |
24 | ″ | ″ | average accuracy |
25 | ″ | ″ | bags |
26 | ″ | ″ | baseline |
27 | ″ | ″ | business intelligence tools |
28 | ″ | ″ | changes |
29 | ″ | ″ | cost |
30 | ″ | ″ | dataset |
31 | ″ | ″ | demand |
32 | ″ | ″ | documents |
33 | ″ | ″ | domain |
34 | ″ | ″ | domain adaptation |
35 | ″ | ″ | domain model |
36 | ″ | ″ | domain-independent solutions |
37 | ″ | ″ | high demand |
38 | ″ | ″ | high implementation |
39 | ″ | ″ | human intervention |
40 | ″ | ″ | implementation |
41 | ″ | ″ | intelligence tools |
42 | ″ | ″ | intervention |
43 | ″ | ″ | learning |
44 | ″ | ″ | low accuracy |
45 | ″ | ″ | machine learning |
46 | ″ | ″ | maintenance costs |
47 | ″ | ″ | model |
48 | ″ | ″ | multi-domain approach |
49 | ″ | ″ | need |
50 | ″ | ″ | new topic areas |
51 | ″ | ″ | percentage points |
52 | ″ | ″ | personal assistants |
53 | ″ | ″ | point |
54 | ″ | ″ | products |
55 | ″ | ″ | range |
56 | ″ | ″ | recommender systems |
57 | ″ | ″ | scoring algorithm |
58 | ″ | ″ | sentiment analysis |
59 | ″ | ″ | sentiment model |
60 | ″ | ″ | solution |
61 | ″ | ″ | speed |
62 | ″ | ″ | strong assurance |
63 | ″ | ″ | system |
64 | ″ | ″ | textual analysis |
65 | ″ | ″ | tool |
66 | ″ | ″ | topic areas |
67 | ″ | ″ | unpredictable array |
68 | ″ | ″ | variance |
69 | ″ | ″ | wide range |
70 | ″ | ″ | words |
71 | ″ | ″ | words model |
72 | ″ | schema:name | TISA: Topic Independence Scoring Algorithm |
73 | ″ | schema:pagination | 555-570 |
74 | ″ | schema:productId | N3d0a8f2489d445469ee20d4617d337c3 |
75 | ″ | ″ | N51a5ed1f86bf415fa0e6d542ed1e5e43 |
76 | ″ | schema:publisher | N49e925c899644d9c8489d2aece86ee26 |
77 | ″ | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1011572208 |
78 | ″ | ″ | https://doi.org/10.1007/978-3-642-39712-7_43 |
79 | ″ | schema:sdDatePublished | 2022-05-10T10:54 |
80 | ″ | schema:sdLicense | https://scigraph.springernature.com/explorer/license/ |
81 | ″ | schema:sdPublisher | N7da71807815644f19e17b135af5ff42b |
82 | ″ | schema:url | https://doi.org/10.1007/978-3-642-39712-7_43 |
83 | ″ | sgo:license | sg:explorer/license/ |
84 | ″ | sgo:sdDataset | chapters |
85 | ″ | rdf:type | schema:Chapter |
86 | N3d0a8f2489d445469ee20d4617d337c3 | schema:name | doi |
87 | ″ | schema:value | 10.1007/978-3-642-39712-7_43 |
88 | ″ | rdf:type | schema:PropertyValue |
89 | N49e925c899644d9c8489d2aece86ee26 | schema:name | Springer Nature |
90 | ″ | rdf:type | schema:Organisation |
91 | N51a5ed1f86bf415fa0e6d542ed1e5e43 | schema:name | dimensions_id |
92 | ″ | schema:value | pub.1011572208 |
93 | ″ | rdf:type | schema:PropertyValue |
94 | N59ab6fbe79214ed386c751f69085a452 | rdf:first | Naf7dcd670d4c410d802b5d46294169d6 |
95 | ″ | rdf:rest | rdf:nil |
96 | N6ea1aa83eb1c4f4994ef9b5b37fc878d | rdf:first | sg:person.016274302751.69 |
97 | ″ | rdf:rest | rdf:nil |
98 | N7da71807815644f19e17b135af5ff42b | schema:name | Springer Nature - SN SciGraph project |
99 | ″ | rdf:type | schema:Organization |
100 | N9bef8f4c973e4ade8490ffeb1dc90c6a | schema:isbn | 978-3-642-39711-0 |
101 | ″ | ″ | 978-3-642-39712-7 |
102 | ″ | schema:name | Machine Learning and Data Mining in Pattern Recognition |
103 | ″ | rdf:type | schema:Book |
104 | Naf7dcd670d4c410d802b5d46294169d6 | schema:familyName | Perner |
105 | ″ | schema:givenName | Petra |
106 | ″ | rdf:type | schema:Person |
107 | Nde746e3809e94ea0ae04f3b945fc8e3f | rdf:first | sg:person.013017730117.89 |
108 | ″ | rdf:rest | Nf3cd36494f244a7bb1b2a3f183581a51 |
109 | Nf3cd36494f244a7bb1b2a3f183581a51 | rdf:first | sg:person.010560124331.39 |
110 | ″ | rdf:rest | N6ea1aa83eb1c4f4994ef9b5b37fc878d |
111 | anzsrc-for:08 | schema:inDefinedTermSet | anzsrc-for: |
112 | ″ | schema:name | Information and Computing Sciences |
113 | ″ | rdf:type | schema:DefinedTerm |
114 | anzsrc-for:0801 | schema:inDefinedTermSet | anzsrc-for: |
115 | ″ | schema:name | Artificial Intelligence and Image Processing |
116 | ″ | rdf:type | schema:DefinedTerm |
117 | sg:person.010560124331.39 | schema:affiliation | grid-institutes:None |
118 | ″ | schema:familyName | Cheng |
119 | ″ | schema:givenName | Doreen |
120 | ″ | schema:sameAs | https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010560124331.39 |
121 | ″ | rdf:type | schema:Person |
122 | sg:person.013017730117.89 | schema:affiliation | grid-institutes:None |
123 | ″ | schema:familyName | Martineau |
124 | ″ | schema:givenName | Justin Christopher |
125 | ″ | schema:sameAs | https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013017730117.89 |
126 | ″ | rdf:type | schema:Person |
127 | sg:person.016274302751.69 | schema:affiliation | grid-institutes:grid.266673.0 |
128 | ″ | schema:familyName | Finin |
129 | ″ | schema:givenName | Tim |
130 | ″ | schema:sameAs | https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016274302751.69 |
131 | ″ | rdf:type | schema:Person |
132 | grid-institutes:None | schema:alternateName | Samsung Information Systems North America, USA |
133 | ″ | schema:name | Samsung Information Systems North America, USA |
134 | ″ | rdf:type | schema:Organization |
135 | grid-institutes:grid.266673.0 | schema:alternateName | University of Maryland Baltimore County, USA |
136 | ″ | schema:name | University of Maryland Baltimore County, USA |
137 | ″ | rdf:type | schema:Organization |