Chinese New Word Finding Using Character-Based Parsing Model View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2005

AUTHORS

Yao Meng , Hao Yu , Fumihito Nishino

ABSTRACT

The new word finding is a difficult and indispensable task in Chinese segmentation. The traditional methods used the string statistical information to identify the new words in the large-scale corpus. But it is neither convenient nor powerful enough to describe the words’ internal and external structure laws. And it is even the less effective when the occurrence frequency of the new words is very low in the corpus. In this paper, we present a novel method of using parsing information to find the new words. A character level PCFG model is trained by People Daily corpus and Penn Chinese Treebank. The characters are inputted into the character parsing system, and the words are determined by the parsing tree automatically. Our method describes the word-building rules in the full sentences, and takes advantage of rich context to find the new words. This is especially effective in identifying the occasional words or rarely used words, which are usually in low frequency. The preliminary experiments indicate that our method can substantially improve the precision and recall of the new word finding process. More... »

PAGES

733-742

Book

TITLE

Natural Language Processing – IJCNLP 2004

ISBN

978-3-540-24475-2
978-3-540-30211-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-30211-7_77

DOI

http://dx.doi.org/10.1007/978-3-540-30211-7_77

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1010135566


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1702", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Cognitive Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology and Cognitive Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "name": [
            "FUJITSU R&D Center Co., Ltd, Room 1003, Eagle Run Plaza No.26 Xiaoyun Road Chaoyang, 100016, District Beijing, P.R.China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Meng", 
        "givenName": "Yao", 
        "id": "sg:person.015016035647.71", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015016035647.71"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "FUJITSU R&D Center Co., Ltd, Room 1003, Eagle Run Plaza No.26 Xiaoyun Road Chaoyang, 100016, District Beijing, P.R.China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yu", 
        "givenName": "Hao", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "FUJITSU R&D Center Co., Ltd, Room 1003, Eagle Run Plaza No.26 Xiaoyun Road Chaoyang, 100016, District Beijing, P.R.China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nishino", 
        "givenName": "Fumihito", 
        "id": "sg:person.015634767150.63", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015634767150.63"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2005", 
    "datePublishedReg": "2005-01-01", 
    "description": "The new word finding is a difficult and indispensable task in Chinese segmentation. The traditional methods used the string statistical information to identify the new words in the large-scale corpus. But it is neither convenient nor powerful enough to describe the words\u2019 internal and external structure laws. And it is even the less effective when the occurrence frequency of the new words is very low in the corpus. In this paper, we present a novel method of using parsing information to find the new words. A character level PCFG model is trained by People Daily corpus and Penn Chinese Treebank. The characters are inputted into the character parsing system, and the words are determined by the parsing tree automatically. Our method describes the word-building rules in the full sentences, and takes advantage of rich context to find the new words. This is especially effective in identifying the occasional words or rarely used words, which are usually in low frequency. The preliminary experiments indicate that our method can substantially improve the precision and recall of the new word finding process.", 
    "editor": [
      {
        "familyName": "Su", 
        "givenName": "Keh-Yih", 
        "type": "Person"
      }, 
      {
        "familyName": "Tsujii", 
        "givenName": "Jun\u2019ichi", 
        "type": "Person"
      }, 
      {
        "familyName": "Lee", 
        "givenName": "Jong-Hyeok", 
        "type": "Person"
      }, 
      {
        "familyName": "Kwong", 
        "givenName": "Oi Yee", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-30211-7_77", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-24475-2", 
        "978-3-540-30211-7"
      ], 
      "name": "Natural Language Processing \u2013 IJCNLP 2004", 
      "type": "Book"
    }, 
    "name": "Chinese New Word Finding Using Character-Based Parsing Model", 
    "pagination": "733-742", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1010135566"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-30211-7_77"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "304957815353505802ef2e7b7e26f6b0c4cb938ff628390ddd52739a8d31763a"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-30211-7_77", 
      "https://app.dimensions.ai/details/publication/pub.1010135566"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T08:24", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000363_0000000363/records_70043_00000000.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-3-540-30211-7_77"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-30211-7_77'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-30211-7_77'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-30211-7_77'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-30211-7_77'


 

This table displays all metadata directly associated to this object as RDF triples.

96 TRIPLES      22 PREDICATES      27 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-30211-7_77 schema:about anzsrc-for:17
2 anzsrc-for:1702
3 schema:author N75122dcf28b24fca93f31b9538fe3335
4 schema:datePublished 2005
5 schema:datePublishedReg 2005-01-01
6 schema:description The new word finding is a difficult and indispensable task in Chinese segmentation. The traditional methods used the string statistical information to identify the new words in the large-scale corpus. But it is neither convenient nor powerful enough to describe the words’ internal and external structure laws. And it is even the less effective when the occurrence frequency of the new words is very low in the corpus. In this paper, we present a novel method of using parsing information to find the new words. A character level PCFG model is trained by People Daily corpus and Penn Chinese Treebank. The characters are inputted into the character parsing system, and the words are determined by the parsing tree automatically. Our method describes the word-building rules in the full sentences, and takes advantage of rich context to find the new words. This is especially effective in identifying the occasional words or rarely used words, which are usually in low frequency. The preliminary experiments indicate that our method can substantially improve the precision and recall of the new word finding process.
7 schema:editor Na95da75d45c349ecb7e3ae9b2eb94167
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf N745e21023d1349e49b29fef66bc0fd11
12 schema:name Chinese New Word Finding Using Character-Based Parsing Model
13 schema:pagination 733-742
14 schema:productId Nc426500325554787a3055693740a4e0c
15 Nc67b101e710d4551a75eec437b1e4216
16 Nd757a01d9775468eb772183c91d9560b
17 schema:publisher N03f290f264f34a4580a2f48fdb813de6
18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010135566
19 https://doi.org/10.1007/978-3-540-30211-7_77
20 schema:sdDatePublished 2019-04-16T08:24
21 schema:sdLicense https://scigraph.springernature.com/explorer/license/
22 schema:sdPublisher N68be9073f7464ca3bce0076faeb6dad5
23 schema:url https://link.springer.com/10.1007%2F978-3-540-30211-7_77
24 sgo:license sg:explorer/license/
25 sgo:sdDataset chapters
26 rdf:type schema:Chapter
27 N03f290f264f34a4580a2f48fdb813de6 schema:location Berlin, Heidelberg
28 schema:name Springer Berlin Heidelberg
29 rdf:type schema:Organisation
30 N14164ffbbedb4849a8447401907412bd schema:familyName Tsujii
31 schema:givenName Jun’ichi
32 rdf:type schema:Person
33 N2ed63e8c308341c78ac1da8dd5440e14 rdf:first N14164ffbbedb4849a8447401907412bd
34 rdf:rest Nd557ed6377de46bea97029b81bb80bdd
35 N3f120e1baf974e3089d75fc47eb99fcb rdf:first N41a0ce36bb04456d86adfa3ff48fff77
36 rdf:rest Nb9863940fdc04be886aeedd97341d8b5
37 N41a0ce36bb04456d86adfa3ff48fff77 schema:affiliation N4232e123844943409c75a69eff965cae
38 schema:familyName Yu
39 schema:givenName Hao
40 rdf:type schema:Person
41 N4232e123844943409c75a69eff965cae schema:name FUJITSU R&D Center Co., Ltd, Room 1003, Eagle Run Plaza No.26 Xiaoyun Road Chaoyang, 100016, District Beijing, P.R.China
42 rdf:type schema:Organization
43 N57f1245e87de4a589a47d9ec934c59be schema:name FUJITSU R&D Center Co., Ltd, Room 1003, Eagle Run Plaza No.26 Xiaoyun Road Chaoyang, 100016, District Beijing, P.R.China
44 rdf:type schema:Organization
45 N628db23573154a47a35435de2633e10a schema:familyName Su
46 schema:givenName Keh-Yih
47 rdf:type schema:Person
48 N68be9073f7464ca3bce0076faeb6dad5 schema:name Springer Nature - SN SciGraph project
49 rdf:type schema:Organization
50 N6f9f0d5185b64ff68b814380ab34d806 schema:familyName Lee
51 schema:givenName Jong-Hyeok
52 rdf:type schema:Person
53 N745e21023d1349e49b29fef66bc0fd11 schema:isbn 978-3-540-24475-2
54 978-3-540-30211-7
55 schema:name Natural Language Processing – IJCNLP 2004
56 rdf:type schema:Book
57 N75122dcf28b24fca93f31b9538fe3335 rdf:first sg:person.015016035647.71
58 rdf:rest N3f120e1baf974e3089d75fc47eb99fcb
59 N815e042b8b124540a98e78707d50272b schema:name FUJITSU R&D Center Co., Ltd, Room 1003, Eagle Run Plaza No.26 Xiaoyun Road Chaoyang, 100016, District Beijing, P.R.China
60 rdf:type schema:Organization
61 Na95da75d45c349ecb7e3ae9b2eb94167 rdf:first N628db23573154a47a35435de2633e10a
62 rdf:rest N2ed63e8c308341c78ac1da8dd5440e14
63 Nb4be143d56ba4ce2adfce4d31dede2d9 schema:familyName Kwong
64 schema:givenName Oi Yee
65 rdf:type schema:Person
66 Nb9863940fdc04be886aeedd97341d8b5 rdf:first sg:person.015634767150.63
67 rdf:rest rdf:nil
68 Nc426500325554787a3055693740a4e0c schema:name doi
69 schema:value 10.1007/978-3-540-30211-7_77
70 rdf:type schema:PropertyValue
71 Nc67b101e710d4551a75eec437b1e4216 schema:name readcube_id
72 schema:value 304957815353505802ef2e7b7e26f6b0c4cb938ff628390ddd52739a8d31763a
73 rdf:type schema:PropertyValue
74 Nd557ed6377de46bea97029b81bb80bdd rdf:first N6f9f0d5185b64ff68b814380ab34d806
75 rdf:rest Ne1ca42a7381d4323ae102af7a543f093
76 Nd757a01d9775468eb772183c91d9560b schema:name dimensions_id
77 schema:value pub.1010135566
78 rdf:type schema:PropertyValue
79 Ne1ca42a7381d4323ae102af7a543f093 rdf:first Nb4be143d56ba4ce2adfce4d31dede2d9
80 rdf:rest rdf:nil
81 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
82 schema:name Psychology and Cognitive Sciences
83 rdf:type schema:DefinedTerm
84 anzsrc-for:1702 schema:inDefinedTermSet anzsrc-for:
85 schema:name Cognitive Sciences
86 rdf:type schema:DefinedTerm
87 sg:person.015016035647.71 schema:affiliation N57f1245e87de4a589a47d9ec934c59be
88 schema:familyName Meng
89 schema:givenName Yao
90 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015016035647.71
91 rdf:type schema:Person
92 sg:person.015634767150.63 schema:affiliation N815e042b8b124540a98e78707d50272b
93 schema:familyName Nishino
94 schema:givenName Fumihito
95 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015634767150.63
96 rdf:type schema:Person
 




Preview window. Press ESC to close (or click here)


...