A new online field feature selection algorithm based on streaming data View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2018-08-13

AUTHORS

Zhenjiang Zhang, Fuxing Song, Peng Zhang, Han-Chieh Chao, Yingsi Zhao

ABSTRACT

The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency. More... »

PAGES

1-13

References to SciGraph publications

  • 2016. Collectives of Term Weighting Methods for Natural Language Call Routing in INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS
  • 2009. Boolean Model in ENCYCLOPEDIA OF DATABASE SYSTEMS
  • 2015-05. Deep learning in NATURE
  • 2002. Learning to Classify Text Using Support Vector Machines in NONE
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s12652-018-0959-0

    DOI

    http://dx.doi.org/10.1007/s12652-018-0959-0

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1106131909


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Beijing Jiaotong University", 
              "id": "https://www.grid.ac/institutes/grid.181531.f", 
              "name": [
                "School of Software Engineering, Beijing Jiaotong University, 100044, Beijing, China", 
                "Department of Electronic and Information Engineering, Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of Education, Beijing Jiaotong University, 100044, Beijing, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Zhang", 
            "givenName": "Zhenjiang", 
            "id": "sg:person.016205573667.71", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016205573667.71"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "name": [
                "Department of Electronic and Information Engineering, Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of Education, Beijing Jiaotong University, 100044, Beijing, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Song", 
            "givenName": "Fuxing", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "name": [
                "Department of Electronic and Information Engineering, Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of Education, Beijing Jiaotong University, 100044, Beijing, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Zhang", 
            "givenName": "Peng", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "National Dong Hwa University", 
              "id": "https://www.grid.ac/institutes/grid.260567.0", 
              "name": [
                "School of Information Science and Engineering, Fujian University of Technology, 350118, Fuzhou, China", 
                "School of Mathematics and Computer Science, Wuhan Polytechnic University, 430023, Wuhan, China", 
                "Department of Electrical Engineering, National Dong Hwa University, 97401, Hualien, Taiwan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chao", 
            "givenName": "Han-Chieh", 
            "id": "sg:person.016313323473.36", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016313323473.36"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Beijing Jiaotong University", 
              "id": "https://www.grid.ac/institutes/grid.181531.f", 
              "name": [
                "School of Economics and Management, Beijing Jiaotong University, 100044, Beijing, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Zhao", 
            "givenName": "Yingsi", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1038/nature14539", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010020120", 
              "https://doi.org/10.1038/nature14539"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/543613.543615", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1018126042"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.eswa.2015.02.030", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1026999470"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-1-4615-0907-3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037183810", 
              "https://doi.org/10.1007/978-1-4615-0907-3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-1-4615-0907-3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037183810", 
              "https://doi.org/10.1007/978-1-4615-0907-3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-319-26453-0_6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1041546304", 
              "https://doi.org/10.1007/978-3-319-26453-0_6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.eswa.2016.09.009", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044654657"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-39940-9_917", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1052309186", 
              "https://doi.org/10.1007/978-0-387-39940-9_917"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tsc.2015.2439695", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061786855"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.5120/13122-0472", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1072595540"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icmla.2014.75", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094910041"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/iccwamtip.2015.7493906", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095431247"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-08-13", 
        "datePublishedReg": "2018-08-13", 
        "description": "The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1007/s12652-018-0959-0", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1043999", 
            "issn": [
              "1868-5137", 
              "1868-5145"
            ], 
            "name": "Journal of Ambient Intelligence and Humanized Computing", 
            "type": "Periodical"
          }
        ], 
        "name": "A new online field feature selection algorithm based on streaming data", 
        "pagination": "1-13", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "0287a813cf6a640489d4befef7d8997042a7b7f0e61166dd9a2a9afdbad4be15"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s12652-018-0959-0"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1106131909"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s12652-018-0959-0", 
          "https://app.dimensions.ai/details/publication/pub.1106131909"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T09:39", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000346_0000000346/records_99833_00000004.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://link.springer.com/10.1007%2Fs12652-018-0959-0"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s12652-018-0959-0'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s12652-018-0959-0'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s12652-018-0959-0'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s12652-018-0959-0'


     

    This table displays all metadata directly associated to this object as RDF triples.

    128 TRIPLES      21 PREDICATES      35 URIs      16 LITERALS      5 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s12652-018-0959-0 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author N2e3f6401e65745bb98aca21fcfac7870
    4 schema:citation sg:pub.10.1007/978-0-387-39940-9_917
    5 sg:pub.10.1007/978-1-4615-0907-3
    6 sg:pub.10.1007/978-3-319-26453-0_6
    7 sg:pub.10.1038/nature14539
    8 https://doi.org/10.1016/j.eswa.2015.02.030
    9 https://doi.org/10.1016/j.eswa.2016.09.009
    10 https://doi.org/10.1109/iccwamtip.2015.7493906
    11 https://doi.org/10.1109/icmla.2014.75
    12 https://doi.org/10.1109/tsc.2015.2439695
    13 https://doi.org/10.1145/543613.543615
    14 https://doi.org/10.5120/13122-0472
    15 schema:datePublished 2018-08-13
    16 schema:datePublishedReg 2018-08-13
    17 schema:description The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.
    18 schema:genre research_article
    19 schema:inLanguage en
    20 schema:isAccessibleForFree false
    21 schema:isPartOf sg:journal.1043999
    22 schema:name A new online field feature selection algorithm based on streaming data
    23 schema:pagination 1-13
    24 schema:productId N2136505b50f1489fae3fed047bb351a7
    25 N4a2818cdfd234fa2922c0f0c0175ee65
    26 Nec8754d285984fa182718decf3641bc4
    27 schema:sameAs https://app.dimensions.ai/details/publication/pub.1106131909
    28 https://doi.org/10.1007/s12652-018-0959-0
    29 schema:sdDatePublished 2019-04-11T09:39
    30 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    31 schema:sdPublisher N838617b057604aaf8251df1211b2e29d
    32 schema:url https://link.springer.com/10.1007%2Fs12652-018-0959-0
    33 sgo:license sg:explorer/license/
    34 sgo:sdDataset articles
    35 rdf:type schema:ScholarlyArticle
    36 N0ca511e54b924c9795806c32bd2cbd33 rdf:first Na02d3758095e4d84a8a150fc61067476
    37 rdf:rest N5f6fc28c5e084622a49afc5ce7a9ea75
    38 N2136505b50f1489fae3fed047bb351a7 schema:name readcube_id
    39 schema:value 0287a813cf6a640489d4befef7d8997042a7b7f0e61166dd9a2a9afdbad4be15
    40 rdf:type schema:PropertyValue
    41 N2e3f6401e65745bb98aca21fcfac7870 rdf:first sg:person.016205573667.71
    42 rdf:rest N0ca511e54b924c9795806c32bd2cbd33
    43 N4a2818cdfd234fa2922c0f0c0175ee65 schema:name dimensions_id
    44 schema:value pub.1106131909
    45 rdf:type schema:PropertyValue
    46 N4c768a2d2473479b8d9f14b98f1701f4 schema:affiliation https://www.grid.ac/institutes/grid.181531.f
    47 schema:familyName Zhao
    48 schema:givenName Yingsi
    49 rdf:type schema:Person
    50 N5f6fc28c5e084622a49afc5ce7a9ea75 rdf:first N7f829514f1bd4b08b3acf7a2e0f3a296
    51 rdf:rest Na4f28d65b7c34c6ea2155ade99761250
    52 N66e14d23374c4fd980c6351c5d8ade89 schema:name Department of Electronic and Information Engineering, Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of Education, Beijing Jiaotong University, 100044, Beijing, China
    53 rdf:type schema:Organization
    54 N6c1f7d62d53d42288666a771676f610d schema:name Department of Electronic and Information Engineering, Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of Education, Beijing Jiaotong University, 100044, Beijing, China
    55 rdf:type schema:Organization
    56 N7f829514f1bd4b08b3acf7a2e0f3a296 schema:affiliation N66e14d23374c4fd980c6351c5d8ade89
    57 schema:familyName Zhang
    58 schema:givenName Peng
    59 rdf:type schema:Person
    60 N838617b057604aaf8251df1211b2e29d schema:name Springer Nature - SN SciGraph project
    61 rdf:type schema:Organization
    62 N891818d1da1448d1aadd6772f3b755cd rdf:first N4c768a2d2473479b8d9f14b98f1701f4
    63 rdf:rest rdf:nil
    64 Na02d3758095e4d84a8a150fc61067476 schema:affiliation N6c1f7d62d53d42288666a771676f610d
    65 schema:familyName Song
    66 schema:givenName Fuxing
    67 rdf:type schema:Person
    68 Na4f28d65b7c34c6ea2155ade99761250 rdf:first sg:person.016313323473.36
    69 rdf:rest N891818d1da1448d1aadd6772f3b755cd
    70 Nec8754d285984fa182718decf3641bc4 schema:name doi
    71 schema:value 10.1007/s12652-018-0959-0
    72 rdf:type schema:PropertyValue
    73 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    74 schema:name Information and Computing Sciences
    75 rdf:type schema:DefinedTerm
    76 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    77 schema:name Artificial Intelligence and Image Processing
    78 rdf:type schema:DefinedTerm
    79 sg:journal.1043999 schema:issn 1868-5137
    80 1868-5145
    81 schema:name Journal of Ambient Intelligence and Humanized Computing
    82 rdf:type schema:Periodical
    83 sg:person.016205573667.71 schema:affiliation https://www.grid.ac/institutes/grid.181531.f
    84 schema:familyName Zhang
    85 schema:givenName Zhenjiang
    86 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016205573667.71
    87 rdf:type schema:Person
    88 sg:person.016313323473.36 schema:affiliation https://www.grid.ac/institutes/grid.260567.0
    89 schema:familyName Chao
    90 schema:givenName Han-Chieh
    91 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016313323473.36
    92 rdf:type schema:Person
    93 sg:pub.10.1007/978-0-387-39940-9_917 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052309186
    94 https://doi.org/10.1007/978-0-387-39940-9_917
    95 rdf:type schema:CreativeWork
    96 sg:pub.10.1007/978-1-4615-0907-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037183810
    97 https://doi.org/10.1007/978-1-4615-0907-3
    98 rdf:type schema:CreativeWork
    99 sg:pub.10.1007/978-3-319-26453-0_6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041546304
    100 https://doi.org/10.1007/978-3-319-26453-0_6
    101 rdf:type schema:CreativeWork
    102 sg:pub.10.1038/nature14539 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010020120
    103 https://doi.org/10.1038/nature14539
    104 rdf:type schema:CreativeWork
    105 https://doi.org/10.1016/j.eswa.2015.02.030 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026999470
    106 rdf:type schema:CreativeWork
    107 https://doi.org/10.1016/j.eswa.2016.09.009 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044654657
    108 rdf:type schema:CreativeWork
    109 https://doi.org/10.1109/iccwamtip.2015.7493906 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095431247
    110 rdf:type schema:CreativeWork
    111 https://doi.org/10.1109/icmla.2014.75 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094910041
    112 rdf:type schema:CreativeWork
    113 https://doi.org/10.1109/tsc.2015.2439695 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061786855
    114 rdf:type schema:CreativeWork
    115 https://doi.org/10.1145/543613.543615 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018126042
    116 rdf:type schema:CreativeWork
    117 https://doi.org/10.5120/13122-0472 schema:sameAs https://app.dimensions.ai/details/publication/pub.1072595540
    118 rdf:type schema:CreativeWork
    119 https://www.grid.ac/institutes/grid.181531.f schema:alternateName Beijing Jiaotong University
    120 schema:name Department of Electronic and Information Engineering, Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of Education, Beijing Jiaotong University, 100044, Beijing, China
    121 School of Economics and Management, Beijing Jiaotong University, 100044, Beijing, China
    122 School of Software Engineering, Beijing Jiaotong University, 100044, Beijing, China
    123 rdf:type schema:Organization
    124 https://www.grid.ac/institutes/grid.260567.0 schema:alternateName National Dong Hwa University
    125 schema:name Department of Electrical Engineering, National Dong Hwa University, 97401, Hualien, Taiwan
    126 School of Information Science and Engineering, Fujian University of Technology, 350118, Fuzhou, China
    127 School of Mathematics and Computer Science, Wuhan Polytechnic University, 430023, Wuhan, China
    128 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...