A Finite-State Morphological Analyzer for Wolaytta View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2018

AUTHORS

Tewodros A. Gebreselassie , Jonathan N. Washington , Michael Gasser , Baye Yimam

ABSTRACT

This paper presents the development of a free/open-source finite-state morphological transducer for Wolaytta, an Omotic language of Ethiopia, using the Helsinki Finite-State Transducer toolkit (HFST). Developing a full-fledged morphological analysis tool for an under-resourced language like Wolaytta is an important step towards developing further NLP (Natural Language Processing) applications. Morphological analyzers for highly inflectional languages are most efficiently developed using finite-state transducers. To develop the transducer, a lexicon of root words was obtained semi-automatically. The morphotactics of the language were implemented by hand in the lexc formalism, and morphophonological rules were implemented in the twol formalism. Evaluation of the transducer shows as it has decent coverage (over 80%) of forms in a large corpus and exhibits high precision (94.85%) and recall (94.11%) over a manually verified test set. To the best of our knowledge, this work is the first systematic and exhaustive implementation of the morphology of Wolaytta in a morphological transducer. More... »

PAGES

14-23

References to SciGraph publications

  • 2009. HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers in STATE OF THE ART IN COMPUTATIONAL MORPHOLOGY
  • 2011. HFST—Framework for Compiling and Applying Morphologies in SYSTEMS AND FRAMEWORKS FOR COMPUTATIONAL MORPHOLOGY
  • Book

    TITLE

    Information and Communication Technology for Development for Africa

    ISBN

    978-3-319-95152-2
    978-3-319-95153-9

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2

    DOI

    http://dx.doi.org/10.1007/978-3-319-95153-9_2

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1105309030


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Linguistics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Language, Communication and Culture", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Addis Ababa University", 
              "id": "https://www.grid.ac/institutes/grid.7123.7", 
              "name": [
                "Addis Ababa University"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Gebreselassie", 
            "givenName": "Tewodros A.", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Swarthmore College", 
              "id": "https://www.grid.ac/institutes/grid.264430.7", 
              "name": [
                "Swarthmore College"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Washington", 
            "givenName": "Jonathan N.", 
            "id": "sg:person.016403025362.76", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016403025362.76"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Indiana University System", 
              "id": "https://www.grid.ac/institutes/grid.257410.5", 
              "name": [
                "Indiana University"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Gasser", 
            "givenName": "Michael", 
            "id": "sg:person.01212513340.93", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01212513340.93"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Addis Ababa University", 
              "id": "https://www.grid.ac/institutes/grid.7123.7", 
              "name": [
                "Addis Ababa University"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Yimam", 
            "givenName": "Baye", 
            "id": "sg:person.015320124630.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015320124630.02"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.3115/980431.980529", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000949830"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-23138-4_5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002403691", 
              "https://doi.org/10.1007/978-3-642-23138-4_5"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1017/s1351324906004384", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1003610278"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1017/s1351324906004384", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1003610278"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1220575.1220660", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006680528"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/974358.974391", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009360352"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-04131-0_3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029378268", 
              "https://doi.org/10.1007/978-3-642-04131-0_3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/976744.976810", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039295138"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1075218.1075243", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099236150"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1075218.1075243", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099236150"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018", 
        "datePublishedReg": "2018-01-01", 
        "description": "This paper presents the development of a free/open-source finite-state morphological transducer for Wolaytta, an Omotic language of Ethiopia, using the Helsinki Finite-State Transducer toolkit (HFST). Developing a full-fledged morphological analysis tool for an under-resourced language like Wolaytta is an important step towards developing further NLP (Natural Language Processing) applications. Morphological analyzers for highly inflectional languages are most efficiently developed using finite-state transducers. To develop the transducer, a lexicon of root words was obtained semi-automatically. The morphotactics of the language were implemented by hand in the lexc formalism, and morphophonological rules were implemented in the twol formalism. Evaluation of the transducer shows as it has decent coverage (over 80%) of forms in a large corpus and exhibits high precision (94.85%) and recall (94.11%) over a manually verified test set. To the best of our knowledge, this work is the first systematic and exhaustive implementation of the morphology of Wolaytta in a morphological transducer.", 
        "editor": [
          {
            "familyName": "Mekuria", 
            "givenName": "Fisseha", 
            "type": "Person"
          }, 
          {
            "familyName": "Nigussie", 
            "givenName": "Ethiopia Enideg", 
            "type": "Person"
          }, 
          {
            "familyName": "Dargie", 
            "givenName": "Waltenegus", 
            "type": "Person"
          }, 
          {
            "familyName": "Edward", 
            "givenName": "Mutafugwa", 
            "type": "Person"
          }, 
          {
            "familyName": "Tegegne", 
            "givenName": "Tesfa", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-319-95153-9_2", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": {
          "isbn": [
            "978-3-319-95152-2", 
            "978-3-319-95153-9"
          ], 
          "name": "Information and Communication Technology for Development for Africa", 
          "type": "Book"
        }, 
        "name": "A Finite-State Morphological Analyzer for Wolaytta", 
        "pagination": "14-23", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-319-95153-9_2"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "c3c32fe8691f3cdc6d18474b81d2a2127b234b37d15069d3e215d2fe4fbaa5dc"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1105309030"
            ]
          }
        ], 
        "publisher": {
          "location": "Cham", 
          "name": "Springer International Publishing", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-319-95153-9_2", 
          "https://app.dimensions.ai/details/publication/pub.1105309030"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-15T12:13", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8660_00000604.jsonl", 
        "type": "Chapter", 
        "url": "http://link.springer.com/10.1007/978-3-319-95153-9_2"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-95153-9_2'


     

    This table displays all metadata directly associated to this object as RDF triples.

    137 TRIPLES      23 PREDICATES      35 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-319-95153-9_2 schema:about anzsrc-for:20
    2 anzsrc-for:2004
    3 schema:author N1fdd0b135792485cad75d9eb99afae82
    4 schema:citation sg:pub.10.1007/978-3-642-04131-0_3
    5 sg:pub.10.1007/978-3-642-23138-4_5
    6 https://doi.org/10.1017/s1351324906004384
    7 https://doi.org/10.3115/1075218.1075243
    8 https://doi.org/10.3115/1220575.1220660
    9 https://doi.org/10.3115/974358.974391
    10 https://doi.org/10.3115/976744.976810
    11 https://doi.org/10.3115/980431.980529
    12 schema:datePublished 2018
    13 schema:datePublishedReg 2018-01-01
    14 schema:description This paper presents the development of a free/open-source finite-state morphological transducer for Wolaytta, an Omotic language of Ethiopia, using the Helsinki Finite-State Transducer toolkit (HFST). Developing a full-fledged morphological analysis tool for an under-resourced language like Wolaytta is an important step towards developing further NLP (Natural Language Processing) applications. Morphological analyzers for highly inflectional languages are most efficiently developed using finite-state transducers. To develop the transducer, a lexicon of root words was obtained semi-automatically. The morphotactics of the language were implemented by hand in the lexc formalism, and morphophonological rules were implemented in the twol formalism. Evaluation of the transducer shows as it has decent coverage (over 80%) of forms in a large corpus and exhibits high precision (94.85%) and recall (94.11%) over a manually verified test set. To the best of our knowledge, this work is the first systematic and exhaustive implementation of the morphology of Wolaytta in a morphological transducer.
    15 schema:editor N0f932d4748934c939a52a4a4a013ede1
    16 schema:genre chapter
    17 schema:inLanguage en
    18 schema:isAccessibleForFree false
    19 schema:isPartOf Nadfb6513183d4b0888ea616c2989678f
    20 schema:name A Finite-State Morphological Analyzer for Wolaytta
    21 schema:pagination 14-23
    22 schema:productId N17c04b3b85954b8aba5e775730e4184a
    23 N2ab826213d2f421292d504a7b6852be0
    24 Nb4f30f50849a469b89433804e4ba1908
    25 schema:publisher Na34641d47d7947a6afa835d37cd448ca
    26 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105309030
    27 https://doi.org/10.1007/978-3-319-95153-9_2
    28 schema:sdDatePublished 2019-04-15T12:13
    29 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    30 schema:sdPublisher N9733cc3801bf4c88adf963b188d5a9d9
    31 schema:url http://link.springer.com/10.1007/978-3-319-95153-9_2
    32 sgo:license sg:explorer/license/
    33 sgo:sdDataset chapters
    34 rdf:type schema:Chapter
    35 N0f932d4748934c939a52a4a4a013ede1 rdf:first N0f9ff3730ce54ad1848fb341189b5673
    36 rdf:rest N8273b5d0f15a4361889637d74269f56a
    37 N0f9ff3730ce54ad1848fb341189b5673 schema:familyName Mekuria
    38 schema:givenName Fisseha
    39 rdf:type schema:Person
    40 N1629d9015ae14d84b1469616dd02c296 schema:familyName Edward
    41 schema:givenName Mutafugwa
    42 rdf:type schema:Person
    43 N17c04b3b85954b8aba5e775730e4184a schema:name readcube_id
    44 schema:value c3c32fe8691f3cdc6d18474b81d2a2127b234b37d15069d3e215d2fe4fbaa5dc
    45 rdf:type schema:PropertyValue
    46 N1fdd0b135792485cad75d9eb99afae82 rdf:first Nde052903b45245f3850c69810b60259c
    47 rdf:rest Ndf31743c17ac4f778b69b73035f511f3
    48 N2ab826213d2f421292d504a7b6852be0 schema:name doi
    49 schema:value 10.1007/978-3-319-95153-9_2
    50 rdf:type schema:PropertyValue
    51 N45fd4958bd5949d1ac8d6e5f6668f6c7 rdf:first sg:person.01212513340.93
    52 rdf:rest N786cba7fe6dd44b2a8f0901178b6f19b
    53 N4cef64a766ba44049ee576f94ab3a0e2 rdf:first Nf8f3b39358834ef9b7475c5984d1487d
    54 rdf:rest N7b88e4b92a4d471d965b036ea40951a1
    55 N7262dedbcd0f4b14aa06d4304ec38130 schema:familyName Nigussie
    56 schema:givenName Ethiopia Enideg
    57 rdf:type schema:Person
    58 N786cba7fe6dd44b2a8f0901178b6f19b rdf:first sg:person.015320124630.02
    59 rdf:rest rdf:nil
    60 N7a0199b324d34813ad93e84dd0d2a5aa rdf:first Nbc30a40d3c484f47af694850d341f948
    61 rdf:rest rdf:nil
    62 N7b88e4b92a4d471d965b036ea40951a1 rdf:first N1629d9015ae14d84b1469616dd02c296
    63 rdf:rest N7a0199b324d34813ad93e84dd0d2a5aa
    64 N8273b5d0f15a4361889637d74269f56a rdf:first N7262dedbcd0f4b14aa06d4304ec38130
    65 rdf:rest N4cef64a766ba44049ee576f94ab3a0e2
    66 N9733cc3801bf4c88adf963b188d5a9d9 schema:name Springer Nature - SN SciGraph project
    67 rdf:type schema:Organization
    68 Na34641d47d7947a6afa835d37cd448ca schema:location Cham
    69 schema:name Springer International Publishing
    70 rdf:type schema:Organisation
    71 Nadfb6513183d4b0888ea616c2989678f schema:isbn 978-3-319-95152-2
    72 978-3-319-95153-9
    73 schema:name Information and Communication Technology for Development for Africa
    74 rdf:type schema:Book
    75 Nb4f30f50849a469b89433804e4ba1908 schema:name dimensions_id
    76 schema:value pub.1105309030
    77 rdf:type schema:PropertyValue
    78 Nbc30a40d3c484f47af694850d341f948 schema:familyName Tegegne
    79 schema:givenName Tesfa
    80 rdf:type schema:Person
    81 Nde052903b45245f3850c69810b60259c schema:affiliation https://www.grid.ac/institutes/grid.7123.7
    82 schema:familyName Gebreselassie
    83 schema:givenName Tewodros A.
    84 rdf:type schema:Person
    85 Ndf31743c17ac4f778b69b73035f511f3 rdf:first sg:person.016403025362.76
    86 rdf:rest N45fd4958bd5949d1ac8d6e5f6668f6c7
    87 Nf8f3b39358834ef9b7475c5984d1487d schema:familyName Dargie
    88 schema:givenName Waltenegus
    89 rdf:type schema:Person
    90 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
    91 schema:name Language, Communication and Culture
    92 rdf:type schema:DefinedTerm
    93 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
    94 schema:name Linguistics
    95 rdf:type schema:DefinedTerm
    96 sg:person.01212513340.93 schema:affiliation https://www.grid.ac/institutes/grid.257410.5
    97 schema:familyName Gasser
    98 schema:givenName Michael
    99 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01212513340.93
    100 rdf:type schema:Person
    101 sg:person.015320124630.02 schema:affiliation https://www.grid.ac/institutes/grid.7123.7
    102 schema:familyName Yimam
    103 schema:givenName Baye
    104 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015320124630.02
    105 rdf:type schema:Person
    106 sg:person.016403025362.76 schema:affiliation https://www.grid.ac/institutes/grid.264430.7
    107 schema:familyName Washington
    108 schema:givenName Jonathan N.
    109 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016403025362.76
    110 rdf:type schema:Person
    111 sg:pub.10.1007/978-3-642-04131-0_3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029378268
    112 https://doi.org/10.1007/978-3-642-04131-0_3
    113 rdf:type schema:CreativeWork
    114 sg:pub.10.1007/978-3-642-23138-4_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002403691
    115 https://doi.org/10.1007/978-3-642-23138-4_5
    116 rdf:type schema:CreativeWork
    117 https://doi.org/10.1017/s1351324906004384 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003610278
    118 rdf:type schema:CreativeWork
    119 https://doi.org/10.3115/1075218.1075243 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099236150
    120 rdf:type schema:CreativeWork
    121 https://doi.org/10.3115/1220575.1220660 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006680528
    122 rdf:type schema:CreativeWork
    123 https://doi.org/10.3115/974358.974391 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009360352
    124 rdf:type schema:CreativeWork
    125 https://doi.org/10.3115/976744.976810 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039295138
    126 rdf:type schema:CreativeWork
    127 https://doi.org/10.3115/980431.980529 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000949830
    128 rdf:type schema:CreativeWork
    129 https://www.grid.ac/institutes/grid.257410.5 schema:alternateName Indiana University System
    130 schema:name Indiana University
    131 rdf:type schema:Organization
    132 https://www.grid.ac/institutes/grid.264430.7 schema:alternateName Swarthmore College
    133 schema:name Swarthmore College
    134 rdf:type schema:Organization
    135 https://www.grid.ac/institutes/grid.7123.7 schema:alternateName Addis Ababa University
    136 schema:name Addis Ababa University
    137 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...