Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2002-12-16

AUTHORS

Jen-Yuan Yeh , Hao-Ren Ke , Wei-Pang Yang

ABSTRACT

In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively. More... »

PAGES

76-87

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-36227-4_8

DOI

http://dx.doi.org/10.1007/3-540-36227-4_8

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1010939703


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Computer & Information Science, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.", 
          "id": "http://www.grid.ac/institutes/grid.260539.b", 
          "name": [
            "Department of Computer & Information Science, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C."
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yeh", 
        "givenName": "Jen-Yuan", 
        "id": "sg:person.013047064577.06", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013047064577.06"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Digital Library & Information Section of Library, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.", 
          "id": "http://www.grid.ac/institutes/grid.260539.b", 
          "name": [
            "Digital Library & Information Section of Library, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C."
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ke", 
        "givenName": "Hao-Ren", 
        "id": "sg:person.015237406177.37", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015237406177.37"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Computer & Information Science, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.", 
          "id": "http://www.grid.ac/institutes/grid.260539.b", 
          "name": [
            "Department of Computer & Information Science, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C."
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yang", 
        "givenName": "Wei-Pang", 
        "id": "sg:person.014374171260.51", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014374171260.51"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2002-12-16", 
    "datePublishedReg": "2002-12-16", 
    "description": "In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.", 
    "editor": [
      {
        "familyName": "Lim", 
        "givenName": "Ee- Peng", 
        "type": "Person"
      }, 
      {
        "familyName": "Foo", 
        "givenName": "Schubert", 
        "type": "Person"
      }, 
      {
        "familyName": "Khoo", 
        "givenName": "Chris", 
        "type": "Person"
      }, 
      {
        "familyName": "Chen", 
        "givenName": "Hsinchun", 
        "type": "Person"
      }, 
      {
        "familyName": "Fox", 
        "givenName": "Edward", 
        "type": "Person"
      }, 
      {
        "familyName": "Urs", 
        "givenName": "Shalini", 
        "type": "Person"
      }, 
      {
        "familyName": "Costantino", 
        "givenName": "Thanos", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-36227-4_8", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-00261-1", 
        "978-3-540-36227-2"
      ], 
      "name": "Digital Libraries: People, Knowledge, and Technology", 
      "type": "Book"
    }, 
    "keywords": [
      "corpus-based approach", 
      "Latent Semantic Analysis", 
      "semantic analysis", 
      "Chinese text summarization", 
      "text summarization", 
      "important sentences", 
      "sentence position", 
      "word units", 
      "conceptual structure", 
      "sentences", 
      "documents", 
      "idea", 
      "summarization", 
      "politics", 
      "new ideas", 
      "keyword importance", 
      "article", 
      "summarizer", 
      "novel approach", 
      "feature analysis", 
      "high accuracy", 
      "score function", 
      "genetic algorithm", 
      "feature weights", 
      "relationship map", 
      "compression ratio", 
      "average recall", 
      "trainable summarizer", 
      "approach", 
      "position", 
      "significance", 
      "second approach", 
      "paper", 
      "analysis", 
      "accuracy", 
      "importance", 
      "algorithm", 
      "suitable combination", 
      "weekly", 
      "recall", 
      "summary", 
      "maps", 
      "data", 
      "units", 
      "function", 
      "combination", 
      "structure", 
      "weight", 
      "ratio"
    ], 
    "name": "Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis", 
    "pagination": "76-87", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1010939703"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-36227-4_8"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-36227-4_8", 
      "https://app.dimensions.ai/details/publication/pub.1010939703"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-05-20T07:49", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/chapter/chapter_49.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/3-540-36227-4_8"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-36227-4_8'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-36227-4_8'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-36227-4_8'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-36227-4_8'


 

This table displays all metadata directly associated to this object as RDF triples.

155 TRIPLES      23 PREDICATES      74 URIs      67 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-36227-4_8 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N25483458e4a74989bd7a0dbadfebeb99
4 schema:datePublished 2002-12-16
5 schema:datePublishedReg 2002-12-16
6 schema:description In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.
7 schema:editor Ncebcefb50b7e453ebb33d6ab40b537b2
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf N46ac205ce7a14cadaa58abb326b13adf
12 schema:keywords Chinese text summarization
13 Latent Semantic Analysis
14 accuracy
15 algorithm
16 analysis
17 approach
18 article
19 average recall
20 combination
21 compression ratio
22 conceptual structure
23 corpus-based approach
24 data
25 documents
26 feature analysis
27 feature weights
28 function
29 genetic algorithm
30 high accuracy
31 idea
32 importance
33 important sentences
34 keyword importance
35 maps
36 new ideas
37 novel approach
38 paper
39 politics
40 position
41 ratio
42 recall
43 relationship map
44 score function
45 second approach
46 semantic analysis
47 sentence position
48 sentences
49 significance
50 structure
51 suitable combination
52 summarization
53 summarizer
54 summary
55 text summarization
56 trainable summarizer
57 units
58 weekly
59 weight
60 word units
61 schema:name Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis
62 schema:pagination 76-87
63 schema:productId N80c9ccc4d4ef41f1b135c1ec007d65a9
64 Ndebaae5fcaae4ca597e1f4c0ba324989
65 schema:publisher N7ef979d57370435bb829868aea8e25e5
66 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010939703
67 https://doi.org/10.1007/3-540-36227-4_8
68 schema:sdDatePublished 2022-05-20T07:49
69 schema:sdLicense https://scigraph.springernature.com/explorer/license/
70 schema:sdPublisher N4caa624e37744f8b9864616b29f5ba6d
71 schema:url https://doi.org/10.1007/3-540-36227-4_8
72 sgo:license sg:explorer/license/
73 sgo:sdDataset chapters
74 rdf:type schema:Chapter
75 N00e1870678bb407f80348dd4c1c7f044 schema:familyName Chen
76 schema:givenName Hsinchun
77 rdf:type schema:Person
78 N0291671c764b45cabab29e3bc066a745 rdf:first N00e1870678bb407f80348dd4c1c7f044
79 rdf:rest Na07e5bca380444c296cf655ef962e174
80 N1a0baa1a2cf1424fa097706378e1010a rdf:first sg:person.015237406177.37
81 rdf:rest Nfb1e554230854e95938ee2cb1b7dbf59
82 N1f1c82ff33074d6e9098c3b97538ac88 rdf:first Nac0984716ba6464e932cb9ed6ed52c56
83 rdf:rest N0291671c764b45cabab29e3bc066a745
84 N25483458e4a74989bd7a0dbadfebeb99 rdf:first sg:person.013047064577.06
85 rdf:rest N1a0baa1a2cf1424fa097706378e1010a
86 N2f06efb14a444624b11fa0b904e6d34a schema:familyName Lim
87 schema:givenName Ee- Peng
88 rdf:type schema:Person
89 N38c2a59e372d4395afe39914245b5279 rdf:first N997eca1322234acebf2449b1fceae14d
90 rdf:rest Ndb8c43b9122c44459ae6403fe5b027f5
91 N46ac205ce7a14cadaa58abb326b13adf schema:isbn 978-3-540-00261-1
92 978-3-540-36227-2
93 schema:name Digital Libraries: People, Knowledge, and Technology
94 rdf:type schema:Book
95 N4caa624e37744f8b9864616b29f5ba6d schema:name Springer Nature - SN SciGraph project
96 rdf:type schema:Organization
97 N4e3f0dbabc4845b995c8560cfce70f67 schema:familyName Costantino
98 schema:givenName Thanos
99 rdf:type schema:Person
100 N7ef979d57370435bb829868aea8e25e5 schema:name Springer Nature
101 rdf:type schema:Organisation
102 N80c9ccc4d4ef41f1b135c1ec007d65a9 schema:name dimensions_id
103 schema:value pub.1010939703
104 rdf:type schema:PropertyValue
105 N997eca1322234acebf2449b1fceae14d schema:familyName Urs
106 schema:givenName Shalini
107 rdf:type schema:Person
108 N9a7c49af979c4481a5817e39cbea2b77 schema:familyName Fox
109 schema:givenName Edward
110 rdf:type schema:Person
111 Na07e5bca380444c296cf655ef962e174 rdf:first N9a7c49af979c4481a5817e39cbea2b77
112 rdf:rest N38c2a59e372d4395afe39914245b5279
113 Nac0984716ba6464e932cb9ed6ed52c56 schema:familyName Khoo
114 schema:givenName Chris
115 rdf:type schema:Person
116 Ncebcefb50b7e453ebb33d6ab40b537b2 rdf:first N2f06efb14a444624b11fa0b904e6d34a
117 rdf:rest Ncf7f1226e85841d3af2b6ad6b423f051
118 Ncf7f1226e85841d3af2b6ad6b423f051 rdf:first Ned0de0f9201f4fcdb6951e854fa97eb0
119 rdf:rest N1f1c82ff33074d6e9098c3b97538ac88
120 Ndb8c43b9122c44459ae6403fe5b027f5 rdf:first N4e3f0dbabc4845b995c8560cfce70f67
121 rdf:rest rdf:nil
122 Ndebaae5fcaae4ca597e1f4c0ba324989 schema:name doi
123 schema:value 10.1007/3-540-36227-4_8
124 rdf:type schema:PropertyValue
125 Ned0de0f9201f4fcdb6951e854fa97eb0 schema:familyName Foo
126 schema:givenName Schubert
127 rdf:type schema:Person
128 Nfb1e554230854e95938ee2cb1b7dbf59 rdf:first sg:person.014374171260.51
129 rdf:rest rdf:nil
130 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
131 schema:name Information and Computing Sciences
132 rdf:type schema:DefinedTerm
133 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
134 schema:name Artificial Intelligence and Image Processing
135 rdf:type schema:DefinedTerm
136 sg:person.013047064577.06 schema:affiliation grid-institutes:grid.260539.b
137 schema:familyName Yeh
138 schema:givenName Jen-Yuan
139 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013047064577.06
140 rdf:type schema:Person
141 sg:person.014374171260.51 schema:affiliation grid-institutes:grid.260539.b
142 schema:familyName Yang
143 schema:givenName Wei-Pang
144 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014374171260.51
145 rdf:type schema:Person
146 sg:person.015237406177.37 schema:affiliation grid-institutes:grid.260539.b
147 schema:familyName Ke
148 schema:givenName Hao-Ren
149 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015237406177.37
150 rdf:type schema:Person
151 grid-institutes:grid.260539.b schema:alternateName Department of Computer & Information Science, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.
152 Digital Library & Information Section of Library, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.
153 schema:name Department of Computer & Information Science, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.
154 Digital Library & Information Section of Library, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.
155 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...