Expanding N-grams for Code-Switch Language Models View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2019

AUTHORS

Injy Hamed , Mohamed Elmahdy , Slim Abdennadher

ABSTRACT

It has become common, especially among urban youth, for people to use more than one language in their everyday conversations - a phenomenon referred to by linguists as “code-switching”. With the rise in globalization and the widespread of code-switching among multilingual societies, a great demand has been placed on Natural Language Processing (NLP) applications to be able to handle such mixed data. In this paper, we present our efforts in language modeling for code-switch Arabic-English. In order to train a language model (LM), huge amounts of text data is required in the respective language. However, the main challenge faced in language modeling for code-switch languages, is the lack of available data. In this paper, we propose an approach to artificially generate code-switch Arabic-English n-grams and thus improve the language model. This was done by expanding the relatively-small available corpus and its corresponding n-grams using translation-based approaches. The final LM achieved relative improvements in both perplexity and OOV rates of 1.97% and 16.36% respectively. More... »

PAGES

221-229

Book

TITLE

Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2018

ISBN

978-3-319-99009-5
978-3-319-99010-1

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-99010-1_20

DOI

http://dx.doi.org/10.1007/978-3-319-99010-1_20

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1106409157


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Linguistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Language, Communication and Culture", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "German University in Cairo", 
          "id": "https://www.grid.ac/institutes/grid.187323.c", 
          "name": [
            "The German University in Cairo"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hamed", 
        "givenName": "Injy", 
        "id": "sg:person.016542541344.73", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016542541344.73"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "German University in Cairo", 
          "id": "https://www.grid.ac/institutes/grid.187323.c", 
          "name": [
            "The German University in Cairo"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Elmahdy", 
        "givenName": "Mohamed", 
        "id": "sg:person.013322724557.53", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013322724557.53"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "German University in Cairo", 
          "id": "https://www.grid.ac/institutes/grid.187323.c", 
          "name": [
            "The German University in Cairo"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Abdennadher", 
        "givenName": "Slim", 
        "id": "sg:person.010445445574.13", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010445445574.13"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/j.pragma.2004.10.010", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002477324"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/1467-971x.00181", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008993237"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.procs.2016.04.039", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017048649"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0167-6393(00)00095-9", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038142378"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/0739986304272358", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043943858"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/0739986304272358", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043943858"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.procs.2016.04.044", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044573307"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-70939-8_7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052256105", 
          "https://doi.org/10.1007/978-3-540-70939-8_7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-70939-8_7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052256105", 
          "https://doi.org/10.1007/978-3-540-70939-8_7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/msp.2008.918417", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061423090"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.5923/j.ajsp.20120205.02", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1073508165"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.procs.2017.10.111", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1092616592"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icassp.2014.6854536", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093350674"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/fskd.2009.434", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093463963"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/iscslp.2010.5684900", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093814750"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icassp.2012.6289015", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095336192"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/v1/w14-3904", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099117564"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/v1/w14-3904", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099117564"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2019", 
    "datePublishedReg": "2019-01-01", 
    "description": "It has become common, especially among urban youth, for people to use more than one language in their everyday conversations - a phenomenon referred to by linguists as \u201ccode-switching\u201d. With the rise in globalization and the widespread of code-switching among multilingual societies, a great demand has been placed on Natural Language Processing (NLP) applications to be able to handle such mixed data. In this paper, we present our efforts in language modeling for code-switch Arabic-English. In order to train a language model (LM), huge amounts of text data is required in the respective language. However, the main challenge faced in language modeling for code-switch languages, is the lack of available data. In this paper, we propose an approach to artificially generate code-switch Arabic-English n-grams and thus improve the language model. This was done by expanding the relatively-small available corpus and its corresponding n-grams using translation-based approaches. The final LM achieved relative improvements in both perplexity and OOV rates of 1.97% and 16.36% respectively.", 
    "editor": [
      {
        "familyName": "Hassanien", 
        "givenName": "Aboul Ella", 
        "type": "Person"
      }, 
      {
        "familyName": "Tolba", 
        "givenName": "Mohamed F.", 
        "type": "Person"
      }, 
      {
        "familyName": "Shaalan", 
        "givenName": "Khaled", 
        "type": "Person"
      }, 
      {
        "familyName": "Azar", 
        "givenName": "Ahmad Taher", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-99010-1_20", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-319-99009-5", 
        "978-3-319-99010-1"
      ], 
      "name": "Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2018", 
      "type": "Book"
    }, 
    "name": "Expanding N-grams for Code-Switch Language Models", 
    "pagination": "221-229", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-99010-1_20"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "3262ecd6e5e62b28b9adb4e2180fc3b8a1ae883139e275330b703f62323d5557"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1106409157"
        ]
      }
    ], 
    "publisher": {
      "location": "Cham", 
      "name": "Springer International Publishing", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-99010-1_20", 
      "https://app.dimensions.ai/details/publication/pub.1106409157"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T22:23", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8693_00000485.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/978-3-319-99010-1_20"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-99010-1_20'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-99010-1_20'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-99010-1_20'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-99010-1_20'


 

This table displays all metadata directly associated to this object as RDF triples.

140 TRIPLES      23 PREDICATES      42 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-99010-1_20 schema:about anzsrc-for:20
2 anzsrc-for:2004
3 schema:author Nad1dd08690bb4343a2f2ece72f767f08
4 schema:citation sg:pub.10.1007/978-3-540-70939-8_7
5 https://doi.org/10.1016/j.pragma.2004.10.010
6 https://doi.org/10.1016/j.procs.2016.04.039
7 https://doi.org/10.1016/j.procs.2016.04.044
8 https://doi.org/10.1016/j.procs.2017.10.111
9 https://doi.org/10.1016/s0167-6393(00)00095-9
10 https://doi.org/10.1109/fskd.2009.434
11 https://doi.org/10.1109/icassp.2012.6289015
12 https://doi.org/10.1109/icassp.2014.6854536
13 https://doi.org/10.1109/iscslp.2010.5684900
14 https://doi.org/10.1109/msp.2008.918417
15 https://doi.org/10.1111/1467-971x.00181
16 https://doi.org/10.1177/0739986304272358
17 https://doi.org/10.3115/v1/w14-3904
18 https://doi.org/10.5923/j.ajsp.20120205.02
19 schema:datePublished 2019
20 schema:datePublishedReg 2019-01-01
21 schema:description It has become common, especially among urban youth, for people to use more than one language in their everyday conversations - a phenomenon referred to by linguists as “code-switching”. With the rise in globalization and the widespread of code-switching among multilingual societies, a great demand has been placed on Natural Language Processing (NLP) applications to be able to handle such mixed data. In this paper, we present our efforts in language modeling for code-switch Arabic-English. In order to train a language model (LM), huge amounts of text data is required in the respective language. However, the main challenge faced in language modeling for code-switch languages, is the lack of available data. In this paper, we propose an approach to artificially generate code-switch Arabic-English n-grams and thus improve the language model. This was done by expanding the relatively-small available corpus and its corresponding n-grams using translation-based approaches. The final LM achieved relative improvements in both perplexity and OOV rates of 1.97% and 16.36% respectively.
22 schema:editor Nff224bfd2c7b4df9a55331b29f391396
23 schema:genre chapter
24 schema:inLanguage en
25 schema:isAccessibleForFree false
26 schema:isPartOf Ndb2fb74ff18b4b8b9952e50c109ae76e
27 schema:name Expanding N-grams for Code-Switch Language Models
28 schema:pagination 221-229
29 schema:productId N360f57e4a2664f9c8d91b577f1246992
30 Ne3b8c6096ddb4b15b6d1e9541620216d
31 Nf1544194179e4469ac043d4d13183ddc
32 schema:publisher N78432835b23a4e0688944f6bcf39a6b4
33 schema:sameAs https://app.dimensions.ai/details/publication/pub.1106409157
34 https://doi.org/10.1007/978-3-319-99010-1_20
35 schema:sdDatePublished 2019-04-15T22:23
36 schema:sdLicense https://scigraph.springernature.com/explorer/license/
37 schema:sdPublisher Nffcd874aadda4cbebfe3a37d807513f4
38 schema:url http://link.springer.com/10.1007/978-3-319-99010-1_20
39 sgo:license sg:explorer/license/
40 sgo:sdDataset chapters
41 rdf:type schema:Chapter
42 N1808ee49aee14188a9c4a654003708a5 schema:familyName Hassanien
43 schema:givenName Aboul Ella
44 rdf:type schema:Person
45 N24023447c8db42febd522a7120b57251 rdf:first N2548277e824f4d999922424443639221
46 rdf:rest rdf:nil
47 N2548277e824f4d999922424443639221 schema:familyName Azar
48 schema:givenName Ahmad Taher
49 rdf:type schema:Person
50 N360f57e4a2664f9c8d91b577f1246992 schema:name readcube_id
51 schema:value 3262ecd6e5e62b28b9adb4e2180fc3b8a1ae883139e275330b703f62323d5557
52 rdf:type schema:PropertyValue
53 N4169e06e2c5641549aae0608faf4359e rdf:first sg:person.010445445574.13
54 rdf:rest rdf:nil
55 N58defce56c664213974b6b00e0b65481 rdf:first sg:person.013322724557.53
56 rdf:rest N4169e06e2c5641549aae0608faf4359e
57 N78432835b23a4e0688944f6bcf39a6b4 schema:location Cham
58 schema:name Springer International Publishing
59 rdf:type schema:Organisation
60 N8742849ffc104b6fab321b4f6300cbbd rdf:first Ne1a36145f4204e37bcf4a6fb002f1e25
61 rdf:rest Ne51c6cfc244242ebac33fde37dc504cc
62 Nad1dd08690bb4343a2f2ece72f767f08 rdf:first sg:person.016542541344.73
63 rdf:rest N58defce56c664213974b6b00e0b65481
64 Ndb2fb74ff18b4b8b9952e50c109ae76e schema:isbn 978-3-319-99009-5
65 978-3-319-99010-1
66 schema:name Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2018
67 rdf:type schema:Book
68 Ne1a36145f4204e37bcf4a6fb002f1e25 schema:familyName Tolba
69 schema:givenName Mohamed F.
70 rdf:type schema:Person
71 Ne3b8c6096ddb4b15b6d1e9541620216d schema:name dimensions_id
72 schema:value pub.1106409157
73 rdf:type schema:PropertyValue
74 Ne51c6cfc244242ebac33fde37dc504cc rdf:first Ne662a084b8e44c8cbb24f974148847e9
75 rdf:rest N24023447c8db42febd522a7120b57251
76 Ne662a084b8e44c8cbb24f974148847e9 schema:familyName Shaalan
77 schema:givenName Khaled
78 rdf:type schema:Person
79 Nf1544194179e4469ac043d4d13183ddc schema:name doi
80 schema:value 10.1007/978-3-319-99010-1_20
81 rdf:type schema:PropertyValue
82 Nff224bfd2c7b4df9a55331b29f391396 rdf:first N1808ee49aee14188a9c4a654003708a5
83 rdf:rest N8742849ffc104b6fab321b4f6300cbbd
84 Nffcd874aadda4cbebfe3a37d807513f4 schema:name Springer Nature - SN SciGraph project
85 rdf:type schema:Organization
86 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
87 schema:name Language, Communication and Culture
88 rdf:type schema:DefinedTerm
89 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
90 schema:name Linguistics
91 rdf:type schema:DefinedTerm
92 sg:person.010445445574.13 schema:affiliation https://www.grid.ac/institutes/grid.187323.c
93 schema:familyName Abdennadher
94 schema:givenName Slim
95 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010445445574.13
96 rdf:type schema:Person
97 sg:person.013322724557.53 schema:affiliation https://www.grid.ac/institutes/grid.187323.c
98 schema:familyName Elmahdy
99 schema:givenName Mohamed
100 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013322724557.53
101 rdf:type schema:Person
102 sg:person.016542541344.73 schema:affiliation https://www.grid.ac/institutes/grid.187323.c
103 schema:familyName Hamed
104 schema:givenName Injy
105 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016542541344.73
106 rdf:type schema:Person
107 sg:pub.10.1007/978-3-540-70939-8_7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052256105
108 https://doi.org/10.1007/978-3-540-70939-8_7
109 rdf:type schema:CreativeWork
110 https://doi.org/10.1016/j.pragma.2004.10.010 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002477324
111 rdf:type schema:CreativeWork
112 https://doi.org/10.1016/j.procs.2016.04.039 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017048649
113 rdf:type schema:CreativeWork
114 https://doi.org/10.1016/j.procs.2016.04.044 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044573307
115 rdf:type schema:CreativeWork
116 https://doi.org/10.1016/j.procs.2017.10.111 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092616592
117 rdf:type schema:CreativeWork
118 https://doi.org/10.1016/s0167-6393(00)00095-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038142378
119 rdf:type schema:CreativeWork
120 https://doi.org/10.1109/fskd.2009.434 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093463963
121 rdf:type schema:CreativeWork
122 https://doi.org/10.1109/icassp.2012.6289015 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095336192
123 rdf:type schema:CreativeWork
124 https://doi.org/10.1109/icassp.2014.6854536 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093350674
125 rdf:type schema:CreativeWork
126 https://doi.org/10.1109/iscslp.2010.5684900 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093814750
127 rdf:type schema:CreativeWork
128 https://doi.org/10.1109/msp.2008.918417 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061423090
129 rdf:type schema:CreativeWork
130 https://doi.org/10.1111/1467-971x.00181 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008993237
131 rdf:type schema:CreativeWork
132 https://doi.org/10.1177/0739986304272358 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043943858
133 rdf:type schema:CreativeWork
134 https://doi.org/10.3115/v1/w14-3904 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099117564
135 rdf:type schema:CreativeWork
136 https://doi.org/10.5923/j.ajsp.20120205.02 schema:sameAs https://app.dimensions.ai/details/publication/pub.1073508165
137 rdf:type schema:CreativeWork
138 https://www.grid.ac/institutes/grid.187323.c schema:alternateName German University in Cairo
139 schema:name The German University in Cairo
140 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...