An improved algorithm for weighting keywords in web documents View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2008-06

AUTHORS

Shuang Sun, Liang He, Jing Yang, Jun-zhong Gu

ABSTRACT

In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved. More... »

PAGES

235-239

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s11741-008-0309-2

DOI

http://dx.doi.org/10.1007/s11741-008-0309-2

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1014521259


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "East China Normal University", 
          "id": "https://www.grid.ac/institutes/grid.22069.3f", 
          "name": [
            "Institute of Computer Applications, East China Normal University, 200062, Shanghai, P. R. China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Sun", 
        "givenName": "Shuang", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "East China Normal University", 
          "id": "https://www.grid.ac/institutes/grid.22069.3f", 
          "name": [
            "Institute of Computer Applications, East China Normal University, 200062, Shanghai, P. R. China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "He", 
        "givenName": "Liang", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "East China Normal University", 
          "id": "https://www.grid.ac/institutes/grid.22069.3f", 
          "name": [
            "Institute of Computer Applications, East China Normal University, 200062, Shanghai, P. R. China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yang", 
        "givenName": "Jing", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "East China Normal University", 
          "id": "https://www.grid.ac/institutes/grid.22069.3f", 
          "name": [
            "Institute of Computer Applications, East China Normal University, 200062, Shanghai, P. R. China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Gu", 
        "givenName": "Jun-zhong", 
        "id": "sg:person.07501076773.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07501076773.50"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1360/jos161012", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1065077017"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.28945/509", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1092399633"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icsmc.2002.1173456", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094526049"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2008-06", 
    "datePublishedReg": "2008-06-01", 
    "description": "In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s11741-008-0309-2", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1281879", 
        "issn": [
          "1007-6417", 
          "1863-236X"
        ], 
        "name": "Journal of Shanghai University (English Edition)", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "12"
      }
    ], 
    "name": "An improved algorithm for weighting keywords in web documents", 
    "pagination": "235-239", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "452db0b80d033b51cad981dd8733079c5e760f45de24629eda6490426236a082"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s11741-008-0309-2"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1014521259"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s11741-008-0309-2", 
      "https://app.dimensions.ai/details/publication/pub.1014521259"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T17:33", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8672_00000521.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1007%2Fs11741-008-0309-2"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s11741-008-0309-2'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s11741-008-0309-2'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s11741-008-0309-2'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s11741-008-0309-2'


 

This table displays all metadata directly associated to this object as RDF triples.

88 TRIPLES      21 PREDICATES      30 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s11741-008-0309-2 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author Ncf9956e953e1490b92b71865c19ff6c3
4 schema:citation https://doi.org/10.1109/icsmc.2002.1173456
5 https://doi.org/10.1360/jos161012
6 https://doi.org/10.28945/509
7 schema:datePublished 2008-06
8 schema:datePublishedReg 2008-06-01
9 schema:description In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.
10 schema:genre research_article
11 schema:inLanguage en
12 schema:isAccessibleForFree false
13 schema:isPartOf N06e838a16c774f57a0a127a0bdedd9d2
14 N7a6501fed6534239865bb4851e900c5f
15 sg:journal.1281879
16 schema:name An improved algorithm for weighting keywords in web documents
17 schema:pagination 235-239
18 schema:productId N09de51d280164d469fe21608b8255f4a
19 N2b36305ed9ab4b00a48cebf3b5970606
20 N961475ebe5f3416092eb3316129b9f27
21 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014521259
22 https://doi.org/10.1007/s11741-008-0309-2
23 schema:sdDatePublished 2019-04-10T17:33
24 schema:sdLicense https://scigraph.springernature.com/explorer/license/
25 schema:sdPublisher N452ce7f17a2e4ad8ad9a88f3798052bc
26 schema:url http://link.springer.com/10.1007%2Fs11741-008-0309-2
27 sgo:license sg:explorer/license/
28 sgo:sdDataset articles
29 rdf:type schema:ScholarlyArticle
30 N055f7fb5d2ed42499b9f70c1f670d792 rdf:first Nd7151ad5fcc245cfbf4f7b56d9f7cd73
31 rdf:rest N845517fa171f472589083673a2d32f4e
32 N06e838a16c774f57a0a127a0bdedd9d2 schema:issueNumber 3
33 rdf:type schema:PublicationIssue
34 N09de51d280164d469fe21608b8255f4a schema:name dimensions_id
35 schema:value pub.1014521259
36 rdf:type schema:PropertyValue
37 N2b36305ed9ab4b00a48cebf3b5970606 schema:name readcube_id
38 schema:value 452db0b80d033b51cad981dd8733079c5e760f45de24629eda6490426236a082
39 rdf:type schema:PropertyValue
40 N452ce7f17a2e4ad8ad9a88f3798052bc schema:name Springer Nature - SN SciGraph project
41 rdf:type schema:Organization
42 N7a6501fed6534239865bb4851e900c5f schema:volumeNumber 12
43 rdf:type schema:PublicationVolume
44 N845517fa171f472589083673a2d32f4e rdf:first sg:person.07501076773.50
45 rdf:rest rdf:nil
46 N932e19f4d5b54321945d1bc71b544103 rdf:first Nb703b1217c5748f7abcb3b50ba441c1d
47 rdf:rest N055f7fb5d2ed42499b9f70c1f670d792
48 N961475ebe5f3416092eb3316129b9f27 schema:name doi
49 schema:value 10.1007/s11741-008-0309-2
50 rdf:type schema:PropertyValue
51 Nb703b1217c5748f7abcb3b50ba441c1d schema:affiliation https://www.grid.ac/institutes/grid.22069.3f
52 schema:familyName He
53 schema:givenName Liang
54 rdf:type schema:Person
55 Nbf17146765ab4d48ad013422f61117a5 schema:affiliation https://www.grid.ac/institutes/grid.22069.3f
56 schema:familyName Sun
57 schema:givenName Shuang
58 rdf:type schema:Person
59 Ncf9956e953e1490b92b71865c19ff6c3 rdf:first Nbf17146765ab4d48ad013422f61117a5
60 rdf:rest N932e19f4d5b54321945d1bc71b544103
61 Nd7151ad5fcc245cfbf4f7b56d9f7cd73 schema:affiliation https://www.grid.ac/institutes/grid.22069.3f
62 schema:familyName Yang
63 schema:givenName Jing
64 rdf:type schema:Person
65 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
66 schema:name Information and Computing Sciences
67 rdf:type schema:DefinedTerm
68 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
69 schema:name Information Systems
70 rdf:type schema:DefinedTerm
71 sg:journal.1281879 schema:issn 1007-6417
72 1863-236X
73 schema:name Journal of Shanghai University (English Edition)
74 rdf:type schema:Periodical
75 sg:person.07501076773.50 schema:affiliation https://www.grid.ac/institutes/grid.22069.3f
76 schema:familyName Gu
77 schema:givenName Jun-zhong
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07501076773.50
79 rdf:type schema:Person
80 https://doi.org/10.1109/icsmc.2002.1173456 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094526049
81 rdf:type schema:CreativeWork
82 https://doi.org/10.1360/jos161012 schema:sameAs https://app.dimensions.ai/details/publication/pub.1065077017
83 rdf:type schema:CreativeWork
84 https://doi.org/10.28945/509 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092399633
85 rdf:type schema:CreativeWork
86 https://www.grid.ac/institutes/grid.22069.3f schema:alternateName East China Normal University
87 schema:name Institute of Computer Applications, East China Normal University, 200062, Shanghai, P. R. China
88 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...