TISA: Topic Independence Scoring Algorithm View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2013

AUTHORS

Justin Christopher Martineau , Doreen Cheng , Tim Finin

ABSTRACT

Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas. More... »

PAGES

555-570

Book

TITLE

Machine Learning and Data Mining in Pattern Recognition

ISBN

978-3-642-39711-0
978-3-642-39712-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43

DOI

http://dx.doi.org/10.1007/978-3-642-39712-7_43

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1011572208


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Samsung Information Systems North America, USA", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Samsung Information Systems North America, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Martineau", 
        "givenName": "Justin Christopher", 
        "id": "sg:person.013017730117.89", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013017730117.89"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Samsung Information Systems North America, USA", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Samsung Information Systems North America, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cheng", 
        "givenName": "Doreen", 
        "id": "sg:person.010560124331.39", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010560124331.39"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Maryland Baltimore County, USA", 
          "id": "http://www.grid.ac/institutes/grid.266673.0", 
          "name": [
            "University of Maryland Baltimore County, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Finin", 
        "givenName": "Tim", 
        "id": "sg:person.016274302751.69", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016274302751.69"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2013", 
    "datePublishedReg": "2013-01-01", 
    "description": "Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.", 
    "editor": [
      {
        "familyName": "Perner", 
        "givenName": "Petra", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-39712-7_43", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-39711-0", 
        "978-3-642-39712-7"
      ], 
      "name": "Machine Learning and Data Mining in Pattern Recognition", 
      "type": "Book"
    }, 
    "keywords": [
      "domain-independent solutions", 
      "business intelligence tools", 
      "recommender systems", 
      "intelligence tools", 
      "machine learning", 
      "domain adaptation", 
      "strong assurance", 
      "personal assistants", 
      "words model", 
      "domain model", 
      "scoring algorithm", 
      "sentiment analysis", 
      "sentiment model", 
      "human intervention", 
      "Amazon products", 
      "average accuracy", 
      "multi-domain approach", 
      "adaptation speed", 
      "low accuracy", 
      "maintenance costs", 
      "algorithm", 
      "new topic areas", 
      "unpredictable array", 
      "high implementation", 
      "high demand", 
      "topic areas", 
      "accuracy", 
      "datasets", 
      "applications", 
      "implementation", 
      "documents", 
      "learning", 
      "model", 
      "assistants", 
      "assurance", 
      "tool", 
      "cost", 
      "bags", 
      "system", 
      "domain", 
      "wide range", 
      "speed", 
      "demand", 
      "solution", 
      "words", 
      "need", 
      "area", 
      "adaptation", 
      "point", 
      "textual analysis", 
      "array", 
      "analysis", 
      "percentage points", 
      "products", 
      "variance", 
      "range", 
      "baseline", 
      "changes", 
      "intervention", 
      "approach", 
      "electronic personal assistants", 
      "slow adaptation speed", 
      "true domain-independent solution", 
      "Topic Independence Scoring Algorithm", 
      "Independence Scoring Algorithm", 
      "domain-independent bag", 
      "best preforming sentiment model", 
      "preforming sentiment model", 
      "category Amazon product", 
      "test topic areas", 
      "TISA model"
    ], 
    "name": "TISA: Topic Independence Scoring Algorithm", 
    "pagination": "555-570", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1011572208"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-39712-7_43"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-39712-7_43", 
      "https://app.dimensions.ai/details/publication/pub.1011572208"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-01-01T19:27", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/chapter/chapter_89.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-39712-7_43"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'


 

This table displays all metadata directly associated to this object as RDF triples.

148 TRIPLES      23 PREDICATES      97 URIs      90 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-39712-7_43 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N962edba79006474ea3e9fe99ceac9e6d
4 schema:datePublished 2013
5 schema:datePublishedReg 2013-01-01
6 schema:description Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.
7 schema:editor N22f9a01020e248b995c3d95bb5433067
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf N7166aa5b1e0b480eae94c85563637f51
12 schema:keywords Amazon products
13 Independence Scoring Algorithm
14 TISA model
15 Topic Independence Scoring Algorithm
16 accuracy
17 adaptation
18 adaptation speed
19 algorithm
20 analysis
21 applications
22 approach
23 area
24 array
25 assistants
26 assurance
27 average accuracy
28 bags
29 baseline
30 best preforming sentiment model
31 business intelligence tools
32 category Amazon product
33 changes
34 cost
35 datasets
36 demand
37 documents
38 domain
39 domain adaptation
40 domain model
41 domain-independent bag
42 domain-independent solutions
43 electronic personal assistants
44 high demand
45 high implementation
46 human intervention
47 implementation
48 intelligence tools
49 intervention
50 learning
51 low accuracy
52 machine learning
53 maintenance costs
54 model
55 multi-domain approach
56 need
57 new topic areas
58 percentage points
59 personal assistants
60 point
61 preforming sentiment model
62 products
63 range
64 recommender systems
65 scoring algorithm
66 sentiment analysis
67 sentiment model
68 slow adaptation speed
69 solution
70 speed
71 strong assurance
72 system
73 test topic areas
74 textual analysis
75 tool
76 topic areas
77 true domain-independent solution
78 unpredictable array
79 variance
80 wide range
81 words
82 words model
83 schema:name TISA: Topic Independence Scoring Algorithm
84 schema:pagination 555-570
85 schema:productId N161334853dd14ae78b0f473784641dbb
86 N5145d07e8dbb4704b7c9884a55215e70
87 schema:publisher Nf55cf9dd028e4de081f21774101a98cb
88 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011572208
89 https://doi.org/10.1007/978-3-642-39712-7_43
90 schema:sdDatePublished 2022-01-01T19:27
91 schema:sdLicense https://scigraph.springernature.com/explorer/license/
92 schema:sdPublisher Nb942d74eb0614c9780f9ef8f88fcf668
93 schema:url https://doi.org/10.1007/978-3-642-39712-7_43
94 sgo:license sg:explorer/license/
95 sgo:sdDataset chapters
96 rdf:type schema:Chapter
97 N161334853dd14ae78b0f473784641dbb schema:name doi
98 schema:value 10.1007/978-3-642-39712-7_43
99 rdf:type schema:PropertyValue
100 N22f9a01020e248b995c3d95bb5433067 rdf:first N3bccaaacd51e47afb624afe15f07c5f9
101 rdf:rest rdf:nil
102 N3bccaaacd51e47afb624afe15f07c5f9 schema:familyName Perner
103 schema:givenName Petra
104 rdf:type schema:Person
105 N5145d07e8dbb4704b7c9884a55215e70 schema:name dimensions_id
106 schema:value pub.1011572208
107 rdf:type schema:PropertyValue
108 N53df8863039f4cddb1775ceccecbf866 rdf:first sg:person.016274302751.69
109 rdf:rest rdf:nil
110 N7166aa5b1e0b480eae94c85563637f51 schema:isbn 978-3-642-39711-0
111 978-3-642-39712-7
112 schema:name Machine Learning and Data Mining in Pattern Recognition
113 rdf:type schema:Book
114 N962edba79006474ea3e9fe99ceac9e6d rdf:first sg:person.013017730117.89
115 rdf:rest Na728c635aac440d99742d705985d9e2a
116 Na728c635aac440d99742d705985d9e2a rdf:first sg:person.010560124331.39
117 rdf:rest N53df8863039f4cddb1775ceccecbf866
118 Nb942d74eb0614c9780f9ef8f88fcf668 schema:name Springer Nature - SN SciGraph project
119 rdf:type schema:Organization
120 Nf55cf9dd028e4de081f21774101a98cb schema:name Springer Nature
121 rdf:type schema:Organisation
122 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
123 schema:name Information and Computing Sciences
124 rdf:type schema:DefinedTerm
125 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
126 schema:name Artificial Intelligence and Image Processing
127 rdf:type schema:DefinedTerm
128 sg:person.010560124331.39 schema:affiliation grid-institutes:None
129 schema:familyName Cheng
130 schema:givenName Doreen
131 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010560124331.39
132 rdf:type schema:Person
133 sg:person.013017730117.89 schema:affiliation grid-institutes:None
134 schema:familyName Martineau
135 schema:givenName Justin Christopher
136 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013017730117.89
137 rdf:type schema:Person
138 sg:person.016274302751.69 schema:affiliation grid-institutes:grid.266673.0
139 schema:familyName Finin
140 schema:givenName Tim
141 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016274302751.69
142 rdf:type schema:Person
143 grid-institutes:None schema:alternateName Samsung Information Systems North America, USA
144 schema:name Samsung Information Systems North America, USA
145 rdf:type schema:Organization
146 grid-institutes:grid.266673.0 schema:alternateName University of Maryland Baltimore County, USA
147 schema:name University of Maryland Baltimore County, USA
148 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...