A Finite-State Morphological Analyzer for Wolaytta View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2018

AUTHORS

Tewodros A. Gebreselassie , Jonathan N. Washington , Michael Gasser , Baye Yimam

ABSTRACT

This paper presents the development of a free/open-source finite-state morphological transducer for Wolaytta, an Omotic language of Ethiopia, using the Helsinki Finite-State Transducer toolkit (HFST). Developing a full-fledged morphological analysis tool for an under-resourced language like Wolaytta is an important step towards developing further NLP (Natural Language Processing) applications. Morphological analyzers for highly inflectional languages are most efficiently developed using finite-state transducers. To develop the transducer, a lexicon of root words was obtained semi-automatically. The morphotactics of the language were implemented by hand in the lexc formalism, and morphophonological rules were implemented in the twol formalism. Evaluation of the transducer shows as it has decent coverage (over 80%) of forms in a large corpus and exhibits high precision (94.85%) and recall (94.11%) over a manually verified test set. To the best of our knowledge, this work is the first systematic and exhaustive implementation of the morphology of Wolaytta in a morphological transducer. More... »

PAGES

14-23

References to SciGraph publications

  • 2009. HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers in STATE OF THE ART IN COMPUTATIONAL MORPHOLOGY
  • 2011. HFST—Framework for Compiling and Applying Morphologies in SYSTEMS AND FRAMEWORKS FOR COMPUTATIONAL MORPHOLOGY
  • Book

    TITLE

    Information and Communication Technology for Development for Africa

    ISBN

    978-3-319-95152-2
    978-3-319-95153-9

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2

    DOI

    http://dx.doi.org/10.1007/978-3-319-95153-9_2

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1105309030


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Linguistics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Language, Communication and Culture", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Addis Ababa University", 
              "id": "https://www.grid.ac/institutes/grid.7123.7", 
              "name": [
                "Addis Ababa University"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Gebreselassie", 
            "givenName": "Tewodros A.", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Swarthmore College", 
              "id": "https://www.grid.ac/institutes/grid.264430.7", 
              "name": [
                "Swarthmore College"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Washington", 
            "givenName": "Jonathan N.", 
            "id": "sg:person.016403025362.76", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016403025362.76"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Indiana University System", 
              "id": "https://www.grid.ac/institutes/grid.257410.5", 
              "name": [
                "Indiana University"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Gasser", 
            "givenName": "Michael", 
            "id": "sg:person.01212513340.93", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01212513340.93"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Addis Ababa University", 
              "id": "https://www.grid.ac/institutes/grid.7123.7", 
              "name": [
                "Addis Ababa University"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Yimam", 
            "givenName": "Baye", 
            "id": "sg:person.015320124630.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015320124630.02"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.3115/980431.980529", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000949830"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-23138-4_5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002403691", 
              "https://doi.org/10.1007/978-3-642-23138-4_5"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1017/s1351324906004384", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1003610278"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1017/s1351324906004384", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1003610278"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1220575.1220660", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006680528"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/974358.974391", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009360352"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-04131-0_3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029378268", 
              "https://doi.org/10.1007/978-3-642-04131-0_3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/976744.976810", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039295138"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1075218.1075243", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099236150"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1075218.1075243", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099236150"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018", 
        "datePublishedReg": "2018-01-01", 
        "description": "This paper presents the development of a free/open-source finite-state morphological transducer for Wolaytta, an Omotic language of Ethiopia, using the Helsinki Finite-State Transducer toolkit (HFST). Developing a full-fledged morphological analysis tool for an under-resourced language like Wolaytta is an important step towards developing further NLP (Natural Language Processing) applications. Morphological analyzers for highly inflectional languages are most efficiently developed using finite-state transducers. To develop the transducer, a lexicon of root words was obtained semi-automatically. The morphotactics of the language were implemented by hand in the lexc formalism, and morphophonological rules were implemented in the twol formalism. Evaluation of the transducer shows as it has decent coverage (over 80%) of forms in a large corpus and exhibits high precision (94.85%) and recall (94.11%) over a manually verified test set. To the best of our knowledge, this work is the first systematic and exhaustive implementation of the morphology of Wolaytta in a morphological transducer.", 
        "editor": [
          {
            "familyName": "Mekuria", 
            "givenName": "Fisseha", 
            "type": "Person"
          }, 
          {
            "familyName": "Nigussie", 
            "givenName": "Ethiopia Enideg", 
            "type": "Person"
          }, 
          {
            "familyName": "Dargie", 
            "givenName": "Waltenegus", 
            "type": "Person"
          }, 
          {
            "familyName": "Edward", 
            "givenName": "Mutafugwa", 
            "type": "Person"
          }, 
          {
            "familyName": "Tegegne", 
            "givenName": "Tesfa", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-319-95153-9_2", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": {
          "isbn": [
            "978-3-319-95152-2", 
            "978-3-319-95153-9"
          ], 
          "name": "Information and Communication Technology for Development for Africa", 
          "type": "Book"
        }, 
        "name": "A Finite-State Morphological Analyzer for Wolaytta", 
        "pagination": "14-23", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-319-95153-9_2"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "c3c32fe8691f3cdc6d18474b81d2a2127b234b37d15069d3e215d2fe4fbaa5dc"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1105309030"
            ]
          }
        ], 
        "publisher": {
          "location": "Cham", 
          "name": "Springer International Publishing", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-319-95153-9_2", 
          "https://app.dimensions.ai/details/publication/pub.1105309030"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-15T12:13", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8660_00000604.jsonl", 
        "type": "Chapter", 
        "url": "http://link.springer.com/10.1007/978-3-319-95153-9_2"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2'


     

    This table displays all metadata directly associated to this object as RDF triples.

    137 TRIPLES      23 PREDICATES      35 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-319-95153-9_2 schema:about anzsrc-for:20
    2 anzsrc-for:2004
    3 schema:author N0e7b8014a2ba4b8d9e1c633683c594b5
    4 schema:citation sg:pub.10.1007/978-3-642-04131-0_3
    5 sg:pub.10.1007/978-3-642-23138-4_5
    6 https://doi.org/10.1017/s1351324906004384
    7 https://doi.org/10.3115/1075218.1075243
    8 https://doi.org/10.3115/1220575.1220660
    9 https://doi.org/10.3115/974358.974391
    10 https://doi.org/10.3115/976744.976810
    11 https://doi.org/10.3115/980431.980529
    12 schema:datePublished 2018
    13 schema:datePublishedReg 2018-01-01
    14 schema:description This paper presents the development of a free/open-source finite-state morphological transducer for Wolaytta, an Omotic language of Ethiopia, using the Helsinki Finite-State Transducer toolkit (HFST). Developing a full-fledged morphological analysis tool for an under-resourced language like Wolaytta is an important step towards developing further NLP (Natural Language Processing) applications. Morphological analyzers for highly inflectional languages are most efficiently developed using finite-state transducers. To develop the transducer, a lexicon of root words was obtained semi-automatically. The morphotactics of the language were implemented by hand in the lexc formalism, and morphophonological rules were implemented in the twol formalism. Evaluation of the transducer shows as it has decent coverage (over 80%) of forms in a large corpus and exhibits high precision (94.85%) and recall (94.11%) over a manually verified test set. To the best of our knowledge, this work is the first systematic and exhaustive implementation of the morphology of Wolaytta in a morphological transducer.
    15 schema:editor Nee814b50e37b48ae804b109494d2d730
    16 schema:genre chapter
    17 schema:inLanguage en
    18 schema:isAccessibleForFree false
    19 schema:isPartOf Ne6ec5dc9cdc840fda0c39672704f062b
    20 schema:name A Finite-State Morphological Analyzer for Wolaytta
    21 schema:pagination 14-23
    22 schema:productId N3d1c962216834efc8fa83a0c97f05929
    23 N553b4d3c364845b59aeac58a99f4cba3
    24 Na202232699404b04bea722edd64314d1
    25 schema:publisher N7cc44d746c7943ecabb856c0e91e1db0
    26 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105309030
    27 https://doi.org/10.1007/978-3-319-95153-9_2
    28 schema:sdDatePublished 2019-04-15T12:13
    29 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    30 schema:sdPublisher N4480efe0668e4657abd8e1ceb927c9f3
    31 schema:url http://link.springer.com/10.1007/978-3-319-95153-9_2
    32 sgo:license sg:explorer/license/
    33 sgo:sdDataset chapters
    34 rdf:type schema:Chapter
    35 N084881d28a674716a74e8d7daa37a279 rdf:first N4268f39cdfb5418d91f9e3ca9a17a6d7
    36 rdf:rest rdf:nil
    37 N0b2f4671e6974041975ec540891e1587 schema:affiliation https://www.grid.ac/institutes/grid.7123.7
    38 schema:familyName Gebreselassie
    39 schema:givenName Tewodros A.
    40 rdf:type schema:Person
    41 N0e7b8014a2ba4b8d9e1c633683c594b5 rdf:first N0b2f4671e6974041975ec540891e1587
    42 rdf:rest N6935a833920849af89af4a840c59bb12
    43 N39bf324c5c9048b5923786a3e74083e8 schema:familyName Edward
    44 schema:givenName Mutafugwa
    45 rdf:type schema:Person
    46 N3d1c962216834efc8fa83a0c97f05929 schema:name doi
    47 schema:value 10.1007/978-3-319-95153-9_2
    48 rdf:type schema:PropertyValue
    49 N4268f39cdfb5418d91f9e3ca9a17a6d7 schema:familyName Tegegne
    50 schema:givenName Tesfa
    51 rdf:type schema:Person
    52 N4480efe0668e4657abd8e1ceb927c9f3 schema:name Springer Nature - SN SciGraph project
    53 rdf:type schema:Organization
    54 N553b4d3c364845b59aeac58a99f4cba3 schema:name readcube_id
    55 schema:value c3c32fe8691f3cdc6d18474b81d2a2127b234b37d15069d3e215d2fe4fbaa5dc
    56 rdf:type schema:PropertyValue
    57 N5a98cef3c76140a09e5021849c43f6ba schema:familyName Mekuria
    58 schema:givenName Fisseha
    59 rdf:type schema:Person
    60 N5ce0c533d9b44ac68eab5c54e1b1c99e schema:familyName Nigussie
    61 schema:givenName Ethiopia Enideg
    62 rdf:type schema:Person
    63 N6935a833920849af89af4a840c59bb12 rdf:first sg:person.016403025362.76
    64 rdf:rest N721ffc23f8cb4bf0952fb3d7d4df90eb
    65 N721ffc23f8cb4bf0952fb3d7d4df90eb rdf:first sg:person.01212513340.93
    66 rdf:rest Nf9cf926521aa41468075e05ea590b42f
    67 N7cc44d746c7943ecabb856c0e91e1db0 schema:location Cham
    68 schema:name Springer International Publishing
    69 rdf:type schema:Organisation
    70 Na202232699404b04bea722edd64314d1 schema:name dimensions_id
    71 schema:value pub.1105309030
    72 rdf:type schema:PropertyValue
    73 Nd0f597cf8d3b482b97de4dc00c22c4f3 schema:familyName Dargie
    74 schema:givenName Waltenegus
    75 rdf:type schema:Person
    76 Nd785083ce1fd43e08c9623ba828e76ce rdf:first N5ce0c533d9b44ac68eab5c54e1b1c99e
    77 rdf:rest Nebfe3fed5bc8445a863d0a7c419b93c5
    78 Nde61d4a0a32e4543b82eda6b172eaef4 rdf:first N39bf324c5c9048b5923786a3e74083e8
    79 rdf:rest N084881d28a674716a74e8d7daa37a279
    80 Ne6ec5dc9cdc840fda0c39672704f062b schema:isbn 978-3-319-95152-2
    81 978-3-319-95153-9
    82 schema:name Information and Communication Technology for Development for Africa
    83 rdf:type schema:Book
    84 Nebfe3fed5bc8445a863d0a7c419b93c5 rdf:first Nd0f597cf8d3b482b97de4dc00c22c4f3
    85 rdf:rest Nde61d4a0a32e4543b82eda6b172eaef4
    86 Nee814b50e37b48ae804b109494d2d730 rdf:first N5a98cef3c76140a09e5021849c43f6ba
    87 rdf:rest Nd785083ce1fd43e08c9623ba828e76ce
    88 Nf9cf926521aa41468075e05ea590b42f rdf:first sg:person.015320124630.02
    89 rdf:rest rdf:nil
    90 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
    91 schema:name Language, Communication and Culture
    92 rdf:type schema:DefinedTerm
    93 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
    94 schema:name Linguistics
    95 rdf:type schema:DefinedTerm
    96 sg:person.01212513340.93 schema:affiliation https://www.grid.ac/institutes/grid.257410.5
    97 schema:familyName Gasser
    98 schema:givenName Michael
    99 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01212513340.93
    100 rdf:type schema:Person
    101 sg:person.015320124630.02 schema:affiliation https://www.grid.ac/institutes/grid.7123.7
    102 schema:familyName Yimam
    103 schema:givenName Baye
    104 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015320124630.02
    105 rdf:type schema:Person
    106 sg:person.016403025362.76 schema:affiliation https://www.grid.ac/institutes/grid.264430.7
    107 schema:familyName Washington
    108 schema:givenName Jonathan N.
    109 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016403025362.76
    110 rdf:type schema:Person
    111 sg:pub.10.1007/978-3-642-04131-0_3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029378268
    112 https://doi.org/10.1007/978-3-642-04131-0_3
    113 rdf:type schema:CreativeWork
    114 sg:pub.10.1007/978-3-642-23138-4_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002403691
    115 https://doi.org/10.1007/978-3-642-23138-4_5
    116 rdf:type schema:CreativeWork
    117 https://doi.org/10.1017/s1351324906004384 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003610278
    118 rdf:type schema:CreativeWork
    119 https://doi.org/10.3115/1075218.1075243 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099236150
    120 rdf:type schema:CreativeWork
    121 https://doi.org/10.3115/1220575.1220660 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006680528
    122 rdf:type schema:CreativeWork
    123 https://doi.org/10.3115/974358.974391 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009360352
    124 rdf:type schema:CreativeWork
    125 https://doi.org/10.3115/976744.976810 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039295138
    126 rdf:type schema:CreativeWork
    127 https://doi.org/10.3115/980431.980529 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000949830
    128 rdf:type schema:CreativeWork
    129 https://www.grid.ac/institutes/grid.257410.5 schema:alternateName Indiana University System
    130 schema:name Indiana University
    131 rdf:type schema:Organization
    132 https://www.grid.ac/institutes/grid.264430.7 schema:alternateName Swarthmore College
    133 schema:name Swarthmore College
    134 rdf:type schema:Organization
    135 https://www.grid.ac/institutes/grid.7123.7 schema:alternateName Addis Ababa University
    136 schema:name Addis Ababa University
    137 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...