HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2009

AUTHORS

Krister Lindén , Miikka Silfverberg , Tommi Pirinen

ABSTRACT

Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1] and implemented in tools like TwolC and LexC. Currently many applications of morphologies could use dictionaries encoding the a priori likelihoods of words and expressions as well as the likelihood of relations to other representations or languages. We have made the choice to create open-source tools and language descriptions in order to let as many as possible participate in the effort. The current article presents some of the main tools that we have created such as HFST-LexC, HFST-TwolC and HFST-Compose-Intersect. We evaluate their efficiency in comparison to some similar tools and libraries. In particular, we evaluate them using several full-fledged morphological descriptions. Our tools compare well with similar open source tools, even if we still have some challenges ahead before we can catch up with the commercial tools. We demonstrate that for various reasons a parallel rule approach still seems to be more efficient than a cascaded rule approach when developing finite-state morphologies. More... »

PAGES

28-47

References to SciGraph publications

  • 2006. Compiling Generalized Two-Level Rules and Grammars in ADVANCES IN NATURAL LANGUAGE PROCESSING
  • 2007. OpenFst: A General and Efficient Weighted Finite-State Transducer Library in IMPLEMENTATION AND APPLICATION OF AUTOMATA
  • 2006. A Programming Language for Finite State Transducers in FINITE-STATE METHODS AND NATURAL LANGUAGE PROCESSING
  • Book

    TITLE

    State of the Art in Computational Morphology

    ISBN

    978-3-642-04130-3
    978-3-642-04131-0

    Author Affiliations

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3

    DOI

    http://dx.doi.org/10.1007/978-3-642-04131-0_3

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1029378268


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Linguistics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Language, Communication and Culture", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Helsinki", 
              "id": "https://www.grid.ac/institutes/grid.7737.4", 
              "name": [
                "Department of General Linguistics, University of Helsinki, Finland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lind\u00e9n", 
            "givenName": "Krister", 
            "id": "sg:person.012142452767.27", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012142452767.27"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Helsinki", 
              "id": "https://www.grid.ac/institutes/grid.7737.4", 
              "name": [
                "Department of General Linguistics, University of Helsinki, Finland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Silfverberg", 
            "givenName": "Miikka", 
            "id": "sg:person.012661243541.52", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012661243541.52"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Helsinki", 
              "id": "https://www.grid.ac/institutes/grid.7737.4", 
              "name": [
                "Department of General Linguistics, University of Helsinki, Finland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Pirinen", 
            "givenName": "Tommi", 
            "id": "sg:person.015446606513.39", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015446606513.39"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.3115/980431.980529", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000949830"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/991886.991957", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1013623943"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11780885_38", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032990849", 
              "https://doi.org/10.1007/11780885_38"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11780885_38", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032990849", 
              "https://doi.org/10.1007/11780885_38"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-76336-9_3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035845266", 
              "https://doi.org/10.1007/978-3-540-76336-9_3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-76336-9_3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035845266", 
              "https://doi.org/10.1007/978-3-540-76336-9_3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11816508_19", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039928418", 
              "https://doi.org/10.1007/11816508_19"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11816508_19", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039928418", 
              "https://doi.org/10.1007/11816508_19"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.tcs.2004.07.007", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1052889113"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2009", 
        "datePublishedReg": "2009-01-01", 
        "description": "Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1] and implemented in tools like TwolC and LexC. Currently many applications of morphologies could use dictionaries encoding the a\u00a0priori likelihoods of words and expressions as well as the likelihood of relations to other representations or languages. We have made the choice to create open-source tools and language descriptions in order to let as many as possible participate in the effort. The current article presents some of the main tools that we have created such as HFST-LexC, HFST-TwolC and HFST-Compose-Intersect. We evaluate their efficiency in comparison to some similar tools and libraries. In particular, we evaluate them using several full-fledged morphological descriptions. Our tools compare well with similar open source tools, even if we still have some challenges ahead before we can catch up with the commercial tools. We demonstrate that for various reasons a parallel rule approach still seems to be more efficient than a cascaded rule approach when developing finite-state morphologies.", 
        "editor": [
          {
            "familyName": "Mahlow", 
            "givenName": "Cerstin", 
            "type": "Person"
          }, 
          {
            "familyName": "Piotrowski", 
            "givenName": "Michael", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-642-04131-0_3", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-642-04130-3", 
            "978-3-642-04131-0"
          ], 
          "name": "State of the Art in Computational Morphology", 
          "type": "Book"
        }, 
        "name": "HFST Tools for Morphology \u2013 An Efficient Open-Source Package for Construction of Morphological Analyzers", 
        "pagination": "28-47", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-642-04131-0_3"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "8bf41b94b6f5a68798dfd007fe22f0d348edeb6dd50d0076fd8fd001b2c0ea32"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1029378268"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-642-04131-0_3", 
          "https://app.dimensions.ai/details/publication/pub.1029378268"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-15T20:06", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8687_00000261.jsonl", 
        "type": "Chapter", 
        "url": "http://link.springer.com/10.1007/978-3-642-04131-0_3"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3'


     

    This table displays all metadata directly associated to this object as RDF triples.

    105 TRIPLES      23 PREDICATES      33 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-642-04131-0_3 schema:about anzsrc-for:20
    2 anzsrc-for:2004
    3 schema:author Nf2b66200bfb04115954ba2cfcdad287f
    4 schema:citation sg:pub.10.1007/11780885_38
    5 sg:pub.10.1007/11816508_19
    6 sg:pub.10.1007/978-3-540-76336-9_3
    7 https://doi.org/10.1016/j.tcs.2004.07.007
    8 https://doi.org/10.3115/980431.980529
    9 https://doi.org/10.3115/991886.991957
    10 schema:datePublished 2009
    11 schema:datePublishedReg 2009-01-01
    12 schema:description Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1] and implemented in tools like TwolC and LexC. Currently many applications of morphologies could use dictionaries encoding the a priori likelihoods of words and expressions as well as the likelihood of relations to other representations or languages. We have made the choice to create open-source tools and language descriptions in order to let as many as possible participate in the effort. The current article presents some of the main tools that we have created such as HFST-LexC, HFST-TwolC and HFST-Compose-Intersect. We evaluate their efficiency in comparison to some similar tools and libraries. In particular, we evaluate them using several full-fledged morphological descriptions. Our tools compare well with similar open source tools, even if we still have some challenges ahead before we can catch up with the commercial tools. We demonstrate that for various reasons a parallel rule approach still seems to be more efficient than a cascaded rule approach when developing finite-state morphologies.
    13 schema:editor N67a6c2317b5142b3af4b5eb0e52c8eae
    14 schema:genre chapter
    15 schema:inLanguage en
    16 schema:isAccessibleForFree true
    17 schema:isPartOf N7402ec09a2524b929e28f0a1bec432b8
    18 schema:name HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers
    19 schema:pagination 28-47
    20 schema:productId N2d009dcbcd2b45afb7f4cbb1916bbf1b
    21 N7cdea448e3e04f409d9593a1f3e31e53
    22 Nb5a9970294f64ccdaeea96246346c486
    23 schema:publisher Nf62603806dd349ca8a50af40ade61cab
    24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029378268
    25 https://doi.org/10.1007/978-3-642-04131-0_3
    26 schema:sdDatePublished 2019-04-15T20:06
    27 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    28 schema:sdPublisher N4b87f4794e414a74b79f4f3c7e0f93e1
    29 schema:url http://link.springer.com/10.1007/978-3-642-04131-0_3
    30 sgo:license sg:explorer/license/
    31 sgo:sdDataset chapters
    32 rdf:type schema:Chapter
    33 N06de51ba6be4485d9f5ba98602d858fe schema:familyName Piotrowski
    34 schema:givenName Michael
    35 rdf:type schema:Person
    36 N2d009dcbcd2b45afb7f4cbb1916bbf1b schema:name readcube_id
    37 schema:value 8bf41b94b6f5a68798dfd007fe22f0d348edeb6dd50d0076fd8fd001b2c0ea32
    38 rdf:type schema:PropertyValue
    39 N3359b2ab00524f16bb5b39ccfd046f1e rdf:first N06de51ba6be4485d9f5ba98602d858fe
    40 rdf:rest rdf:nil
    41 N4b87f4794e414a74b79f4f3c7e0f93e1 schema:name Springer Nature - SN SciGraph project
    42 rdf:type schema:Organization
    43 N67a6c2317b5142b3af4b5eb0e52c8eae rdf:first N836422e8a4244410bcd4674078bbe36d
    44 rdf:rest N3359b2ab00524f16bb5b39ccfd046f1e
    45 N7402ec09a2524b929e28f0a1bec432b8 schema:isbn 978-3-642-04130-3
    46 978-3-642-04131-0
    47 schema:name State of the Art in Computational Morphology
    48 rdf:type schema:Book
    49 N7cdea448e3e04f409d9593a1f3e31e53 schema:name doi
    50 schema:value 10.1007/978-3-642-04131-0_3
    51 rdf:type schema:PropertyValue
    52 N836422e8a4244410bcd4674078bbe36d schema:familyName Mahlow
    53 schema:givenName Cerstin
    54 rdf:type schema:Person
    55 N8e1b395027f34c60ae82628b73f61388 rdf:first sg:person.012661243541.52
    56 rdf:rest Nf452d30d67944f09bc27e3ea2b55dcb9
    57 Nb5a9970294f64ccdaeea96246346c486 schema:name dimensions_id
    58 schema:value pub.1029378268
    59 rdf:type schema:PropertyValue
    60 Nf2b66200bfb04115954ba2cfcdad287f rdf:first sg:person.012142452767.27
    61 rdf:rest N8e1b395027f34c60ae82628b73f61388
    62 Nf452d30d67944f09bc27e3ea2b55dcb9 rdf:first sg:person.015446606513.39
    63 rdf:rest rdf:nil
    64 Nf62603806dd349ca8a50af40ade61cab schema:location Berlin, Heidelberg
    65 schema:name Springer Berlin Heidelberg
    66 rdf:type schema:Organisation
    67 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
    68 schema:name Language, Communication and Culture
    69 rdf:type schema:DefinedTerm
    70 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
    71 schema:name Linguistics
    72 rdf:type schema:DefinedTerm
    73 sg:person.012142452767.27 schema:affiliation https://www.grid.ac/institutes/grid.7737.4
    74 schema:familyName Lindén
    75 schema:givenName Krister
    76 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012142452767.27
    77 rdf:type schema:Person
    78 sg:person.012661243541.52 schema:affiliation https://www.grid.ac/institutes/grid.7737.4
    79 schema:familyName Silfverberg
    80 schema:givenName Miikka
    81 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012661243541.52
    82 rdf:type schema:Person
    83 sg:person.015446606513.39 schema:affiliation https://www.grid.ac/institutes/grid.7737.4
    84 schema:familyName Pirinen
    85 schema:givenName Tommi
    86 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015446606513.39
    87 rdf:type schema:Person
    88 sg:pub.10.1007/11780885_38 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032990849
    89 https://doi.org/10.1007/11780885_38
    90 rdf:type schema:CreativeWork
    91 sg:pub.10.1007/11816508_19 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039928418
    92 https://doi.org/10.1007/11816508_19
    93 rdf:type schema:CreativeWork
    94 sg:pub.10.1007/978-3-540-76336-9_3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035845266
    95 https://doi.org/10.1007/978-3-540-76336-9_3
    96 rdf:type schema:CreativeWork
    97 https://doi.org/10.1016/j.tcs.2004.07.007 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052889113
    98 rdf:type schema:CreativeWork
    99 https://doi.org/10.3115/980431.980529 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000949830
    100 rdf:type schema:CreativeWork
    101 https://doi.org/10.3115/991886.991957 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013623943
    102 rdf:type schema:CreativeWork
    103 https://www.grid.ac/institutes/grid.7737.4 schema:alternateName University of Helsinki
    104 schema:name Department of General Linguistics, University of Helsinki, Finland
    105 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...