The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2018-03-16

AUTHORS

Saqib Alam, Nianmin Yao

ABSTRACT

Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naïve Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps. More... »

PAGES

1-17

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s10588-018-9266-8

DOI

http://dx.doi.org/10.1007/s10588-018-9266-8

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1101551571


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Dalian University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.30055.33", 
          "name": [
            "Department of Electronic Information and Electrical Engineering, Dalian University of Technology, Black Building, Linggong Road No. 2, Ganjingzi District, 116024, Dalian, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Alam", 
        "givenName": "Saqib", 
        "id": "sg:person.010410074324.70", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010410074324.70"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Dalian University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.30055.33", 
          "name": [
            "Department of Electronic Information and Electrical Engineering, Dalian University of Technology, Black Building, Linggong Road No. 2, Ganjingzi District, 116024, Dalian, People\u2019s Republic of China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yao", 
        "givenName": "Nianmin", 
        "id": "sg:person.07401712336.33", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07401712336.33"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/j.eswa.2008.06.054", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003030433"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s11280-015-0381-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010525792", 
          "https://doi.org/10.1007/s11280-015-0381-x"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.5120/16952-7048", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1072599171"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0171649", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1083861053"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s12559-017-9503-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091477880", 
          "https://doi.org/10.1007/s12559-017-9503-3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1166/jmihi.2017.2208", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1092073810"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00500-017-2904-0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1092477948", 
          "https://doi.org/10.1007/s00500-017-2904-0"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/mipro.2014.6859797", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093423933"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icicct.2017.7975191", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093699618"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icimia.2017.7975659", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095381241"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-03-16", 
    "datePublishedReg": "2018-03-16", 
    "description": "Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Na\u00efve Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s10588-018-9266-8", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1048813", 
        "issn": [
          "1381-298X", 
          "1572-9346"
        ], 
        "name": "Computational and Mathematical Organization Theory", 
        "type": "Periodical"
      }
    ], 
    "name": "The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis", 
    "pagination": "1-17", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "5a9850574a0723a26294dc502928984156bcdeee5da2a80f7d344bd6d34195b4"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s10588-018-9266-8"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1101551571"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s10588-018-9266-8", 
      "https://app.dimensions.ai/details/publication/pub.1101551571"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T11:56", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000359_0000000359/records_29215_00000003.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1007%2Fs10588-018-9266-8"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10588-018-9266-8'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10588-018-9266-8'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10588-018-9266-8'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10588-018-9266-8'


 

This table displays all metadata directly associated to this object as RDF triples.

95 TRIPLES      21 PREDICATES      34 URIs      16 LITERALS      5 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s10588-018-9266-8 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Ne7d837baa5a94d1d9b4aa7ace3d43a28
4 schema:citation sg:pub.10.1007/s00500-017-2904-0
5 sg:pub.10.1007/s11280-015-0381-x
6 sg:pub.10.1007/s12559-017-9503-3
7 https://doi.org/10.1016/j.eswa.2008.06.054
8 https://doi.org/10.1109/icicct.2017.7975191
9 https://doi.org/10.1109/icimia.2017.7975659
10 https://doi.org/10.1109/mipro.2014.6859797
11 https://doi.org/10.1166/jmihi.2017.2208
12 https://doi.org/10.1371/journal.pone.0171649
13 https://doi.org/10.5120/16952-7048
14 schema:datePublished 2018-03-16
15 schema:datePublishedReg 2018-03-16
16 schema:description Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naïve Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.
17 schema:genre research_article
18 schema:inLanguage en
19 schema:isAccessibleForFree false
20 schema:isPartOf sg:journal.1048813
21 schema:name The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis
22 schema:pagination 1-17
23 schema:productId N56c4a5f40b6445ac9acb62fc0514ac13
24 N578885dfb4ab4e1c834fc85d1b19e0c0
25 N8b0d941e80b54fa9923df187ca502995
26 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101551571
27 https://doi.org/10.1007/s10588-018-9266-8
28 schema:sdDatePublished 2019-04-11T11:56
29 schema:sdLicense https://scigraph.springernature.com/explorer/license/
30 schema:sdPublisher N23bdaa5169f34e21b4a64953546be394
31 schema:url https://link.springer.com/10.1007%2Fs10588-018-9266-8
32 sgo:license sg:explorer/license/
33 sgo:sdDataset articles
34 rdf:type schema:ScholarlyArticle
35 N23bdaa5169f34e21b4a64953546be394 schema:name Springer Nature - SN SciGraph project
36 rdf:type schema:Organization
37 N56c4a5f40b6445ac9acb62fc0514ac13 schema:name doi
38 schema:value 10.1007/s10588-018-9266-8
39 rdf:type schema:PropertyValue
40 N578885dfb4ab4e1c834fc85d1b19e0c0 schema:name readcube_id
41 schema:value 5a9850574a0723a26294dc502928984156bcdeee5da2a80f7d344bd6d34195b4
42 rdf:type schema:PropertyValue
43 N8b0d941e80b54fa9923df187ca502995 schema:name dimensions_id
44 schema:value pub.1101551571
45 rdf:type schema:PropertyValue
46 Nce163be5b9fd464bba3397d3f10018c6 rdf:first sg:person.07401712336.33
47 rdf:rest rdf:nil
48 Ne7d837baa5a94d1d9b4aa7ace3d43a28 rdf:first sg:person.010410074324.70
49 rdf:rest Nce163be5b9fd464bba3397d3f10018c6
50 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
51 schema:name Information and Computing Sciences
52 rdf:type schema:DefinedTerm
53 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
54 schema:name Artificial Intelligence and Image Processing
55 rdf:type schema:DefinedTerm
56 sg:journal.1048813 schema:issn 1381-298X
57 1572-9346
58 schema:name Computational and Mathematical Organization Theory
59 rdf:type schema:Periodical
60 sg:person.010410074324.70 schema:affiliation https://www.grid.ac/institutes/grid.30055.33
61 schema:familyName Alam
62 schema:givenName Saqib
63 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010410074324.70
64 rdf:type schema:Person
65 sg:person.07401712336.33 schema:affiliation https://www.grid.ac/institutes/grid.30055.33
66 schema:familyName Yao
67 schema:givenName Nianmin
68 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07401712336.33
69 rdf:type schema:Person
70 sg:pub.10.1007/s00500-017-2904-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092477948
71 https://doi.org/10.1007/s00500-017-2904-0
72 rdf:type schema:CreativeWork
73 sg:pub.10.1007/s11280-015-0381-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1010525792
74 https://doi.org/10.1007/s11280-015-0381-x
75 rdf:type schema:CreativeWork
76 sg:pub.10.1007/s12559-017-9503-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091477880
77 https://doi.org/10.1007/s12559-017-9503-3
78 rdf:type schema:CreativeWork
79 https://doi.org/10.1016/j.eswa.2008.06.054 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003030433
80 rdf:type schema:CreativeWork
81 https://doi.org/10.1109/icicct.2017.7975191 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093699618
82 rdf:type schema:CreativeWork
83 https://doi.org/10.1109/icimia.2017.7975659 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095381241
84 rdf:type schema:CreativeWork
85 https://doi.org/10.1109/mipro.2014.6859797 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093423933
86 rdf:type schema:CreativeWork
87 https://doi.org/10.1166/jmihi.2017.2208 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092073810
88 rdf:type schema:CreativeWork
89 https://doi.org/10.1371/journal.pone.0171649 schema:sameAs https://app.dimensions.ai/details/publication/pub.1083861053
90 rdf:type schema:CreativeWork
91 https://doi.org/10.5120/16952-7048 schema:sameAs https://app.dimensions.ai/details/publication/pub.1072599171
92 rdf:type schema:CreativeWork
93 https://www.grid.ac/institutes/grid.30055.33 schema:alternateName Dalian University of Technology
94 schema:name Department of Electronic Information and Electrical Engineering, Dalian University of Technology, Black Building, Linggong Road No. 2, Ganjingzi District, 116024, Dalian, People’s Republic of China
95 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...