ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2007

AUTHORS

Yassine Benajiba , Paolo Rosso , José Miguel BenedíRuiz

ABSTRACT

The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language. More... »

PAGES

143-153

Book

TITLE

Computational Linguistics and Intelligent Text Processing

ISBN

978-3-540-70938-1
978-3-540-70939-8

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13

DOI

http://dx.doi.org/10.1007/978-3-540-70939-8_13

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1042484911


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Linguistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Language, Communication and Culture", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Polytechnic University of Valencia", 
          "id": "https://www.grid.ac/institutes/grid.157927.f", 
          "name": [
            "Dpto. Sistemas Inform\u00e1ticos y Computaci\u00f3n (DSIC), Universidad Polit\u00e9cnica de Valencia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Benajiba", 
        "givenName": "Yassine", 
        "id": "sg:person.015602473173.88", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015602473173.88"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Polytechnic University of Valencia", 
          "id": "https://www.grid.ac/institutes/grid.157927.f", 
          "name": [
            "Dpto. Sistemas Inform\u00e1ticos y Computaci\u00f3n (DSIC), Universidad Polit\u00e9cnica de Valencia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rosso", 
        "givenName": "Paolo", 
        "id": "sg:person.013373425233.78", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013373425233.78"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Polytechnic University of Valencia", 
          "id": "https://www.grid.ac/institutes/grid.157927.f", 
          "name": [
            "Dpto. Sistemas Inform\u00e1ticos y Computaci\u00f3n (DSIC), Universidad Polit\u00e9cnica de Valencia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bened\u00edRuiz", 
        "givenName": "Jos\u00e9 Miguel", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.3115/1118853.1118872", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005856625"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1072399.1072402", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011688253"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1118853.1118857", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012610562"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119176.1119200", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014124369"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119176.1119196", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014977021"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/csla.1996.0011", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029750983"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119176.1119199", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032282037"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119355.1119362", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1036319334"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1119176.1119204", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038467014"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.21236/ada460245", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091592349"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1073012.1073015", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099239524"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1073012.1073015", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099239524"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1075096.1075147", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099239709"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1075096.1075147", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099239709"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1621753.1621756", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099256331"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2007", 
    "datePublishedReg": "2007-01-01", 
    "description": "The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language.", 
    "editor": [
      {
        "familyName": "Gelbukh", 
        "givenName": "Alexander", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-70939-8_13", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-70938-1", 
        "978-3-540-70939-8"
      ], 
      "name": "Computational Linguistics and Intelligent Text Processing", 
      "type": "Book"
    }, 
    "name": "ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy", 
    "pagination": "143-153", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1042484911"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-70939-8_13"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "ca5f7fa71fef952ff284638d03b471c3aa06671c71ed51efa297e68c172ffa3f"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-70939-8_13", 
      "https://app.dimensions.ai/details/publication/pub.1042484911"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T07:15", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000353_0000000353/records_45372_00000000.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-3-540-70939-8_13"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-70939-8_13'


 

This table displays all metadata directly associated to this object as RDF triples.

117 TRIPLES      23 PREDICATES      40 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-70939-8_13 schema:about anzsrc-for:20
2 anzsrc-for:2004
3 schema:author Na53bd812a58b46369382988bd7dcdf65
4 schema:citation https://doi.org/10.1006/csla.1996.0011
5 https://doi.org/10.21236/ada460245
6 https://doi.org/10.3115/1072399.1072402
7 https://doi.org/10.3115/1073012.1073015
8 https://doi.org/10.3115/1075096.1075147
9 https://doi.org/10.3115/1118853.1118857
10 https://doi.org/10.3115/1118853.1118872
11 https://doi.org/10.3115/1119176.1119196
12 https://doi.org/10.3115/1119176.1119199
13 https://doi.org/10.3115/1119176.1119200
14 https://doi.org/10.3115/1119176.1119204
15 https://doi.org/10.3115/1119355.1119362
16 https://doi.org/10.3115/1621753.1621756
17 schema:datePublished 2007
18 schema:datePublishedReg 2007-01-01
19 schema:description The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language.
20 schema:editor Ne7628b63218c4220bd6cebef30592965
21 schema:genre chapter
22 schema:inLanguage en
23 schema:isAccessibleForFree true
24 schema:isPartOf N43653111e655479cb5d962985668e244
25 schema:name ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy
26 schema:pagination 143-153
27 schema:productId N82fa8ddaf33c47ca96164067a059067c
28 Na4639cc58fe24d3bbc709153ad6b3d8d
29 Nc5a486fae433428fb6c0fae891667db4
30 schema:publisher Nc774d72019f1412fb52074967adc6578
31 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042484911
32 https://doi.org/10.1007/978-3-540-70939-8_13
33 schema:sdDatePublished 2019-04-16T07:15
34 schema:sdLicense https://scigraph.springernature.com/explorer/license/
35 schema:sdPublisher N6ce5b762bcf444b793a66d355d8783ab
36 schema:url https://link.springer.com/10.1007%2F978-3-540-70939-8_13
37 sgo:license sg:explorer/license/
38 sgo:sdDataset chapters
39 rdf:type schema:Chapter
40 N11055c12f4da429cb33986f579fe4bf1 schema:affiliation https://www.grid.ac/institutes/grid.157927.f
41 schema:familyName BenedíRuiz
42 schema:givenName José Miguel
43 rdf:type schema:Person
44 N1ddf62c123e640309d9893ff171e115d rdf:first sg:person.013373425233.78
45 rdf:rest Nd0653061c29b4e2eaf594fbcc1d336bf
46 N43653111e655479cb5d962985668e244 schema:isbn 978-3-540-70938-1
47 978-3-540-70939-8
48 schema:name Computational Linguistics and Intelligent Text Processing
49 rdf:type schema:Book
50 N6ce5b762bcf444b793a66d355d8783ab schema:name Springer Nature - SN SciGraph project
51 rdf:type schema:Organization
52 N82fa8ddaf33c47ca96164067a059067c schema:name doi
53 schema:value 10.1007/978-3-540-70939-8_13
54 rdf:type schema:PropertyValue
55 Na4639cc58fe24d3bbc709153ad6b3d8d schema:name readcube_id
56 schema:value ca5f7fa71fef952ff284638d03b471c3aa06671c71ed51efa297e68c172ffa3f
57 rdf:type schema:PropertyValue
58 Na53bd812a58b46369382988bd7dcdf65 rdf:first sg:person.015602473173.88
59 rdf:rest N1ddf62c123e640309d9893ff171e115d
60 Nc0a597c0560c4f37a9eac17628154421 schema:familyName Gelbukh
61 schema:givenName Alexander
62 rdf:type schema:Person
63 Nc5a486fae433428fb6c0fae891667db4 schema:name dimensions_id
64 schema:value pub.1042484911
65 rdf:type schema:PropertyValue
66 Nc774d72019f1412fb52074967adc6578 schema:location Berlin, Heidelberg
67 schema:name Springer Berlin Heidelberg
68 rdf:type schema:Organisation
69 Nd0653061c29b4e2eaf594fbcc1d336bf rdf:first N11055c12f4da429cb33986f579fe4bf1
70 rdf:rest rdf:nil
71 Ne7628b63218c4220bd6cebef30592965 rdf:first Nc0a597c0560c4f37a9eac17628154421
72 rdf:rest rdf:nil
73 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
74 schema:name Language, Communication and Culture
75 rdf:type schema:DefinedTerm
76 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
77 schema:name Linguistics
78 rdf:type schema:DefinedTerm
79 sg:person.013373425233.78 schema:affiliation https://www.grid.ac/institutes/grid.157927.f
80 schema:familyName Rosso
81 schema:givenName Paolo
82 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013373425233.78
83 rdf:type schema:Person
84 sg:person.015602473173.88 schema:affiliation https://www.grid.ac/institutes/grid.157927.f
85 schema:familyName Benajiba
86 schema:givenName Yassine
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015602473173.88
88 rdf:type schema:Person
89 https://doi.org/10.1006/csla.1996.0011 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029750983
90 rdf:type schema:CreativeWork
91 https://doi.org/10.21236/ada460245 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091592349
92 rdf:type schema:CreativeWork
93 https://doi.org/10.3115/1072399.1072402 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011688253
94 rdf:type schema:CreativeWork
95 https://doi.org/10.3115/1073012.1073015 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099239524
96 rdf:type schema:CreativeWork
97 https://doi.org/10.3115/1075096.1075147 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099239709
98 rdf:type schema:CreativeWork
99 https://doi.org/10.3115/1118853.1118857 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012610562
100 rdf:type schema:CreativeWork
101 https://doi.org/10.3115/1118853.1118872 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005856625
102 rdf:type schema:CreativeWork
103 https://doi.org/10.3115/1119176.1119196 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014977021
104 rdf:type schema:CreativeWork
105 https://doi.org/10.3115/1119176.1119199 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032282037
106 rdf:type schema:CreativeWork
107 https://doi.org/10.3115/1119176.1119200 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014124369
108 rdf:type schema:CreativeWork
109 https://doi.org/10.3115/1119176.1119204 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038467014
110 rdf:type schema:CreativeWork
111 https://doi.org/10.3115/1119355.1119362 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036319334
112 rdf:type schema:CreativeWork
113 https://doi.org/10.3115/1621753.1621756 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099256331
114 rdf:type schema:CreativeWork
115 https://www.grid.ac/institutes/grid.157927.f schema:alternateName Polytechnic University of Valencia
116 schema:name Dpto. Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain
117 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...