ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2007

AUTHORS

Yassine Benajiba , Paolo Rosso , José Miguel BenedíRuiz

ABSTRACT

The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language. More... »

PAGES

143-153

Book

TITLE

Computational Linguistics and Intelligent Text Processing

ISBN

978-3-540-70938-1
978-3-540-70939-8

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13

DOI

http://dx.doi.org/10.1007/978-3-540-70939-8_13

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1042484911


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Linguistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Language, Communication and Culture", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Polytechnic University of Valencia", 
          "id": "https://www.grid.ac/institutes/grid.157927.f", 
          "name": [
            "Dpto. Sistemas Inform\u00e1ticos y Computaci\u00f3n (DSIC), Universidad Polit\u00e9cnica de Valencia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Benajiba", 
        "givenName": "Yassine", 
        "id": "sg:person.015602473173.88", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015602473173.88"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Polytechnic University of Valencia", 
          "id": "https://www.grid.ac/institutes/grid.157927.f", 
          "name": [
            "Dpto. Sistemas Inform\u00e1ticos y Computaci\u00f3n (DSIC), Universidad Polit\u00e9cnica de Valencia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rosso", 
        "givenName": "Paolo", 
        "id": "sg:person.013373425233.78", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013373425233.78"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Polytechnic University of Valencia", 
          "id": "https://www.grid.ac/institutes/grid.157927.f", 
          "name": [
            "Dpto. Sistemas Inform\u00e1ticos y Computaci\u00f3n (DSIC), Universidad Polit\u00e9cnica de Valencia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bened\u00edRuiz", 
        "givenName": "Jos\u00e9 Miguel", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.3115/1118853.1118872", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005856625"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1072399.1072402", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011688253"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1118853.1118857", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012610562"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119176.1119200", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014124369"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119176.1119196", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014977021"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/csla.1996.0011", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029750983"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119176.1119199", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032282037"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119355.1119362", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1036319334"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119176.1119204", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038467014"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.21236/ada460245", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091592349"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1073012.1073015", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099239524"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1073012.1073015", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099239524"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1075096.1075147", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099239709"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1075096.1075147", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099239709"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1621753.1621756", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099256331"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2007", 
    "datePublishedReg": "2007-01-01", 
    "description": "The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language.", 
    "editor": [
      {
        "familyName": "Gelbukh", 
        "givenName": "Alexander", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-70939-8_13", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-70938-1", 
        "978-3-540-70939-8"
      ], 
      "name": "Computational Linguistics and Intelligent Text Processing", 
      "type": "Book"
    }, 
    "name": "ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy", 
    "pagination": "143-153", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1042484911"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-70939-8_13"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "ca5f7fa71fef952ff284638d03b471c3aa06671c71ed51efa297e68c172ffa3f"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-70939-8_13", 
      "https://app.dimensions.ai/details/publication/pub.1042484911"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T07:15", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000353_0000000353/records_45372_00000000.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-3-540-70939-8_13"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13'


 

This table displays all metadata directly associated to this object as RDF triples.

117 TRIPLES      23 PREDICATES      40 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-70939-8_13 schema:about anzsrc-for:20
2 anzsrc-for:2004
3 schema:author N34dccf071c5f4e6496a3dce152892418
4 schema:citation https://doi.org/10.1006/csla.1996.0011
5 https://doi.org/10.21236/ada460245
6 https://doi.org/10.3115/1072399.1072402
7 https://doi.org/10.3115/1073012.1073015
8 https://doi.org/10.3115/1075096.1075147
9 https://doi.org/10.3115/1118853.1118857
10 https://doi.org/10.3115/1118853.1118872
11 https://doi.org/10.3115/1119176.1119196
12 https://doi.org/10.3115/1119176.1119199
13 https://doi.org/10.3115/1119176.1119200
14 https://doi.org/10.3115/1119176.1119204
15 https://doi.org/10.3115/1119355.1119362
16 https://doi.org/10.3115/1621753.1621756
17 schema:datePublished 2007
18 schema:datePublishedReg 2007-01-01
19 schema:description The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language.
20 schema:editor N95072c649e504501b2228ca9a1fba0a3
21 schema:genre chapter
22 schema:inLanguage en
23 schema:isAccessibleForFree true
24 schema:isPartOf N710b45f772cd4b83b1cfbaf307282de4
25 schema:name ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy
26 schema:pagination 143-153
27 schema:productId N1469ac02486a4924953f77dedbbfa561
28 N57c3c188d6104a1e9e07a202c2056efa
29 N62299053d84f4606a79c31bf09ceab42
30 schema:publisher Ne418bc2209454b008d06fdcd8e219b5d
31 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042484911
32 https://doi.org/10.1007/978-3-540-70939-8_13
33 schema:sdDatePublished 2019-04-16T07:15
34 schema:sdLicense https://scigraph.springernature.com/explorer/license/
35 schema:sdPublisher N0520e5d54dab4bfdb33727044019d0a0
36 schema:url https://link.springer.com/10.1007%2F978-3-540-70939-8_13
37 sgo:license sg:explorer/license/
38 sgo:sdDataset chapters
39 rdf:type schema:Chapter
40 N021fd4b5e45f4b8e8d88e330481db279 rdf:first sg:person.013373425233.78
41 rdf:rest N3b81725222e74d2787ea65238044551d
42 N0520e5d54dab4bfdb33727044019d0a0 schema:name Springer Nature - SN SciGraph project
43 rdf:type schema:Organization
44 N1469ac02486a4924953f77dedbbfa561 schema:name readcube_id
45 schema:value ca5f7fa71fef952ff284638d03b471c3aa06671c71ed51efa297e68c172ffa3f
46 rdf:type schema:PropertyValue
47 N2bd97919e3b64407bd300d580ff9087f schema:familyName Gelbukh
48 schema:givenName Alexander
49 rdf:type schema:Person
50 N34dccf071c5f4e6496a3dce152892418 rdf:first sg:person.015602473173.88
51 rdf:rest N021fd4b5e45f4b8e8d88e330481db279
52 N3b81725222e74d2787ea65238044551d rdf:first N585bc25d377f4b0cbbf755f0163325b4
53 rdf:rest rdf:nil
54 N57c3c188d6104a1e9e07a202c2056efa schema:name dimensions_id
55 schema:value pub.1042484911
56 rdf:type schema:PropertyValue
57 N585bc25d377f4b0cbbf755f0163325b4 schema:affiliation https://www.grid.ac/institutes/grid.157927.f
58 schema:familyName BenedíRuiz
59 schema:givenName José Miguel
60 rdf:type schema:Person
61 N62299053d84f4606a79c31bf09ceab42 schema:name doi
62 schema:value 10.1007/978-3-540-70939-8_13
63 rdf:type schema:PropertyValue
64 N710b45f772cd4b83b1cfbaf307282de4 schema:isbn 978-3-540-70938-1
65 978-3-540-70939-8
66 schema:name Computational Linguistics and Intelligent Text Processing
67 rdf:type schema:Book
68 N95072c649e504501b2228ca9a1fba0a3 rdf:first N2bd97919e3b64407bd300d580ff9087f
69 rdf:rest rdf:nil
70 Ne418bc2209454b008d06fdcd8e219b5d schema:location Berlin, Heidelberg
71 schema:name Springer Berlin Heidelberg
72 rdf:type schema:Organisation
73 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
74 schema:name Language, Communication and Culture
75 rdf:type schema:DefinedTerm
76 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
77 schema:name Linguistics
78 rdf:type schema:DefinedTerm
79 sg:person.013373425233.78 schema:affiliation https://www.grid.ac/institutes/grid.157927.f
80 schema:familyName Rosso
81 schema:givenName Paolo
82 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013373425233.78
83 rdf:type schema:Person
84 sg:person.015602473173.88 schema:affiliation https://www.grid.ac/institutes/grid.157927.f
85 schema:familyName Benajiba
86 schema:givenName Yassine
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015602473173.88
88 rdf:type schema:Person
89 https://doi.org/10.1006/csla.1996.0011 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029750983
90 rdf:type schema:CreativeWork
91 https://doi.org/10.21236/ada460245 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091592349
92 rdf:type schema:CreativeWork
93 https://doi.org/10.3115/1072399.1072402 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011688253
94 rdf:type schema:CreativeWork
95 https://doi.org/10.3115/1073012.1073015 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099239524
96 rdf:type schema:CreativeWork
97 https://doi.org/10.3115/1075096.1075147 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099239709
98 rdf:type schema:CreativeWork
99 https://doi.org/10.3115/1118853.1118857 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012610562
100 rdf:type schema:CreativeWork
101 https://doi.org/10.3115/1118853.1118872 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005856625
102 rdf:type schema:CreativeWork
103 https://doi.org/10.3115/1119176.1119196 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014977021
104 rdf:type schema:CreativeWork
105 https://doi.org/10.3115/1119176.1119199 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032282037
106 rdf:type schema:CreativeWork
107 https://doi.org/10.3115/1119176.1119200 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014124369
108 rdf:type schema:CreativeWork
109 https://doi.org/10.3115/1119176.1119204 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038467014
110 rdf:type schema:CreativeWork
111 https://doi.org/10.3115/1119355.1119362 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036319334
112 rdf:type schema:CreativeWork
113 https://doi.org/10.3115/1621753.1621756 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099256331
114 rdf:type schema:CreativeWork
115 https://www.grid.ac/institutes/grid.157927.f schema:alternateName Polytechnic University of Valencia
116 schema:name Dpto. Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain
117 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...