TISA: Topic Independence Scoring Algorithm View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2013

AUTHORS

Justin Christopher Martineau , Doreen Cheng , Tim Finin

ABSTRACT

Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas. More... »

PAGES

555-570

Book

TITLE

Machine Learning and Data Mining in Pattern Recognition

ISBN

978-3-642-39711-0
978-3-642-39712-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43

DOI

http://dx.doi.org/10.1007/978-3-642-39712-7_43

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1011572208


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Samsung Information Systems North America, USA", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Samsung Information Systems North America, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Martineau", 
        "givenName": "Justin Christopher", 
        "id": "sg:person.013017730117.89", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013017730117.89"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Samsung Information Systems North America, USA", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Samsung Information Systems North America, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cheng", 
        "givenName": "Doreen", 
        "id": "sg:person.010560124331.39", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010560124331.39"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Maryland Baltimore County, USA", 
          "id": "http://www.grid.ac/institutes/grid.266673.0", 
          "name": [
            "University of Maryland Baltimore County, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Finin", 
        "givenName": "Tim", 
        "id": "sg:person.016274302751.69", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016274302751.69"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2013", 
    "datePublishedReg": "2013-01-01", 
    "description": "Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.", 
    "editor": [
      {
        "familyName": "Perner", 
        "givenName": "Petra", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-39712-7_43", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-39711-0", 
        "978-3-642-39712-7"
      ], 
      "name": "Machine Learning and Data Mining in Pattern Recognition", 
      "type": "Book"
    }, 
    "keywords": [
      "business intelligence tools", 
      "domain-independent solutions", 
      "recommender systems", 
      "machine learning", 
      "intelligence tools", 
      "domain adaptation", 
      "personal assistants", 
      "Amazon products", 
      "strong assurance", 
      "words model", 
      "sentiment model", 
      "sentiment analysis", 
      "human intervention", 
      "domain model", 
      "average accuracy", 
      "scoring algorithm", 
      "low accuracy", 
      "adaptation speed", 
      "maintenance costs", 
      "algorithm", 
      "multi-domain approach", 
      "new topic areas", 
      "topic areas", 
      "unpredictable array", 
      "high implementation", 
      "high demand", 
      "accuracy", 
      "dataset", 
      "applications", 
      "learning", 
      "implementation", 
      "documents", 
      "assistants", 
      "model", 
      "assurance", 
      "tool", 
      "bags", 
      "cost", 
      "wide range", 
      "system", 
      "domain", 
      "speed", 
      "demand", 
      "words", 
      "solution", 
      "need", 
      "area", 
      "textual analysis", 
      "adaptation", 
      "point", 
      "analysis", 
      "array", 
      "percentage points", 
      "products", 
      "range", 
      "variance", 
      "baseline", 
      "changes", 
      "intervention", 
      "approach"
    ], 
    "name": "TISA: Topic Independence Scoring Algorithm", 
    "pagination": "555-570", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1011572208"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-39712-7_43"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-39712-7_43", 
      "https://app.dimensions.ai/details/publication/pub.1011572208"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-05-10T10:54", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220509/entities/gbq_results/chapter/chapter_462.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-39712-7_43"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-39712-7_43'


 

This table displays all metadata directly associated to this object as RDF triples.

137 TRIPLES      23 PREDICATES      86 URIs      79 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-39712-7_43 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Nde746e3809e94ea0ae04f3b945fc8e3f
4 schema:datePublished 2013
5 schema:datePublishedReg 2013-01-01
6 schema:description Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.
7 schema:editor N59ab6fbe79214ed386c751f69085a452
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf N9bef8f4c973e4ade8490ffeb1dc90c6a
12 schema:keywords Amazon products
13 accuracy
14 adaptation
15 adaptation speed
16 algorithm
17 analysis
18 applications
19 approach
20 area
21 array
22 assistants
23 assurance
24 average accuracy
25 bags
26 baseline
27 business intelligence tools
28 changes
29 cost
30 dataset
31 demand
32 documents
33 domain
34 domain adaptation
35 domain model
36 domain-independent solutions
37 high demand
38 high implementation
39 human intervention
40 implementation
41 intelligence tools
42 intervention
43 learning
44 low accuracy
45 machine learning
46 maintenance costs
47 model
48 multi-domain approach
49 need
50 new topic areas
51 percentage points
52 personal assistants
53 point
54 products
55 range
56 recommender systems
57 scoring algorithm
58 sentiment analysis
59 sentiment model
60 solution
61 speed
62 strong assurance
63 system
64 textual analysis
65 tool
66 topic areas
67 unpredictable array
68 variance
69 wide range
70 words
71 words model
72 schema:name TISA: Topic Independence Scoring Algorithm
73 schema:pagination 555-570
74 schema:productId N3d0a8f2489d445469ee20d4617d337c3
75 N51a5ed1f86bf415fa0e6d542ed1e5e43
76 schema:publisher N49e925c899644d9c8489d2aece86ee26
77 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011572208
78 https://doi.org/10.1007/978-3-642-39712-7_43
79 schema:sdDatePublished 2022-05-10T10:54
80 schema:sdLicense https://scigraph.springernature.com/explorer/license/
81 schema:sdPublisher N7da71807815644f19e17b135af5ff42b
82 schema:url https://doi.org/10.1007/978-3-642-39712-7_43
83 sgo:license sg:explorer/license/
84 sgo:sdDataset chapters
85 rdf:type schema:Chapter
86 N3d0a8f2489d445469ee20d4617d337c3 schema:name doi
87 schema:value 10.1007/978-3-642-39712-7_43
88 rdf:type schema:PropertyValue
89 N49e925c899644d9c8489d2aece86ee26 schema:name Springer Nature
90 rdf:type schema:Organisation
91 N51a5ed1f86bf415fa0e6d542ed1e5e43 schema:name dimensions_id
92 schema:value pub.1011572208
93 rdf:type schema:PropertyValue
94 N59ab6fbe79214ed386c751f69085a452 rdf:first Naf7dcd670d4c410d802b5d46294169d6
95 rdf:rest rdf:nil
96 N6ea1aa83eb1c4f4994ef9b5b37fc878d rdf:first sg:person.016274302751.69
97 rdf:rest rdf:nil
98 N7da71807815644f19e17b135af5ff42b schema:name Springer Nature - SN SciGraph project
99 rdf:type schema:Organization
100 N9bef8f4c973e4ade8490ffeb1dc90c6a schema:isbn 978-3-642-39711-0
101 978-3-642-39712-7
102 schema:name Machine Learning and Data Mining in Pattern Recognition
103 rdf:type schema:Book
104 Naf7dcd670d4c410d802b5d46294169d6 schema:familyName Perner
105 schema:givenName Petra
106 rdf:type schema:Person
107 Nde746e3809e94ea0ae04f3b945fc8e3f rdf:first sg:person.013017730117.89
108 rdf:rest Nf3cd36494f244a7bb1b2a3f183581a51
109 Nf3cd36494f244a7bb1b2a3f183581a51 rdf:first sg:person.010560124331.39
110 rdf:rest N6ea1aa83eb1c4f4994ef9b5b37fc878d
111 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
112 schema:name Information and Computing Sciences
113 rdf:type schema:DefinedTerm
114 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
115 schema:name Artificial Intelligence and Image Processing
116 rdf:type schema:DefinedTerm
117 sg:person.010560124331.39 schema:affiliation grid-institutes:None
118 schema:familyName Cheng
119 schema:givenName Doreen
120 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010560124331.39
121 rdf:type schema:Person
122 sg:person.013017730117.89 schema:affiliation grid-institutes:None
123 schema:familyName Martineau
124 schema:givenName Justin Christopher
125 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013017730117.89
126 rdf:type schema:Person
127 sg:person.016274302751.69 schema:affiliation grid-institutes:grid.266673.0
128 schema:familyName Finin
129 schema:givenName Tim
130 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016274302751.69
131 rdf:type schema:Person
132 grid-institutes:None schema:alternateName Samsung Information Systems North America, USA
133 schema:name Samsung Information Systems North America, USA
134 rdf:type schema:Organization
135 grid-institutes:grid.266673.0 schema:alternateName University of Maryland Baltimore County, USA
136 schema:name University of Maryland Baltimore County, USA
137 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...