HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2009

AUTHORS

Krister Lindén , Miikka Silfverberg , Tommi Pirinen

ABSTRACT

Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1] and implemented in tools like TwolC and LexC. Currently many applications of morphologies could use dictionaries encoding the a priori likelihoods of words and expressions as well as the likelihood of relations to other representations or languages. We have made the choice to create open-source tools and language descriptions in order to let as many as possible participate in the effort. The current article presents some of the main tools that we have created such as HFST-LexC, HFST-TwolC and HFST-Compose-Intersect. We evaluate their efficiency in comparison to some similar tools and libraries. In particular, we evaluate them using several full-fledged morphological descriptions. Our tools compare well with similar open source tools, even if we still have some challenges ahead before we can catch up with the commercial tools. We demonstrate that for various reasons a parallel rule approach still seems to be more efficient than a cascaded rule approach when developing finite-state morphologies. More... »

PAGES

28-47

References to SciGraph publications

Book

TITLE

State of the Art in Computational Morphology

ISBN

978-3-642-04130-3
978-3-642-04131-0

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3

DOI

http://dx.doi.org/10.1007/978-3-642-04131-0_3

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1029378268


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Linguistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Language, Communication and Culture", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Helsinki", 
          "id": "https://www.grid.ac/institutes/grid.7737.4", 
          "name": [
            "Department of General Linguistics, University of Helsinki, Finland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lind\u00e9n", 
        "givenName": "Krister", 
        "id": "sg:person.012142452767.27", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012142452767.27"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Helsinki", 
          "id": "https://www.grid.ac/institutes/grid.7737.4", 
          "name": [
            "Department of General Linguistics, University of Helsinki, Finland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Silfverberg", 
        "givenName": "Miikka", 
        "id": "sg:person.012661243541.52", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012661243541.52"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Helsinki", 
          "id": "https://www.grid.ac/institutes/grid.7737.4", 
          "name": [
            "Department of General Linguistics, University of Helsinki, Finland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pirinen", 
        "givenName": "Tommi", 
        "id": "sg:person.015446606513.39", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015446606513.39"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.3115/980431.980529", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000949830"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/991886.991957", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013623943"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11780885_38", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032990849", 
          "https://doi.org/10.1007/11780885_38"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11780885_38", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032990849", 
          "https://doi.org/10.1007/11780885_38"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-76336-9_3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035845266", 
          "https://doi.org/10.1007/978-3-540-76336-9_3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-76336-9_3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035845266", 
          "https://doi.org/10.1007/978-3-540-76336-9_3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11816508_19", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039928418", 
          "https://doi.org/10.1007/11816508_19"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11816508_19", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039928418", 
          "https://doi.org/10.1007/11816508_19"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.tcs.2004.07.007", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052889113"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2009", 
    "datePublishedReg": "2009-01-01", 
    "description": "Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1] and implemented in tools like TwolC and LexC. Currently many applications of morphologies could use dictionaries encoding the a\u00a0priori likelihoods of words and expressions as well as the likelihood of relations to other representations or languages. We have made the choice to create open-source tools and language descriptions in order to let as many as possible participate in the effort. The current article presents some of the main tools that we have created such as HFST-LexC, HFST-TwolC and HFST-Compose-Intersect. We evaluate their efficiency in comparison to some similar tools and libraries. In particular, we evaluate them using several full-fledged morphological descriptions. Our tools compare well with similar open source tools, even if we still have some challenges ahead before we can catch up with the commercial tools. We demonstrate that for various reasons a parallel rule approach still seems to be more efficient than a cascaded rule approach when developing finite-state morphologies.", 
    "editor": [
      {
        "familyName": "Mahlow", 
        "givenName": "Cerstin", 
        "type": "Person"
      }, 
      {
        "familyName": "Piotrowski", 
        "givenName": "Michael", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-04131-0_3", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-04130-3", 
        "978-3-642-04131-0"
      ], 
      "name": "State of the Art in Computational Morphology", 
      "type": "Book"
    }, 
    "name": "HFST Tools for Morphology \u2013 An Efficient Open-Source Package for Construction of Morphological Analyzers", 
    "pagination": "28-47", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-04131-0_3"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "8bf41b94b6f5a68798dfd007fe22f0d348edeb6dd50d0076fd8fd001b2c0ea32"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1029378268"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-04131-0_3", 
      "https://app.dimensions.ai/details/publication/pub.1029378268"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T20:06", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8687_00000261.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/978-3-642-04131-0_3"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-04131-0_3'


 

This table displays all metadata directly associated to this object as RDF triples.

105 TRIPLES      23 PREDICATES      33 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-04131-0_3 schema:about anzsrc-for:20
2 anzsrc-for:2004
3 schema:author N2a33e2076acf4397a32c5cc9f4f0c1a0
4 schema:citation sg:pub.10.1007/11780885_38
5 sg:pub.10.1007/11816508_19
6 sg:pub.10.1007/978-3-540-76336-9_3
7 https://doi.org/10.1016/j.tcs.2004.07.007
8 https://doi.org/10.3115/980431.980529
9 https://doi.org/10.3115/991886.991957
10 schema:datePublished 2009
11 schema:datePublishedReg 2009-01-01
12 schema:description Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1] and implemented in tools like TwolC and LexC. Currently many applications of morphologies could use dictionaries encoding the a priori likelihoods of words and expressions as well as the likelihood of relations to other representations or languages. We have made the choice to create open-source tools and language descriptions in order to let as many as possible participate in the effort. The current article presents some of the main tools that we have created such as HFST-LexC, HFST-TwolC and HFST-Compose-Intersect. We evaluate their efficiency in comparison to some similar tools and libraries. In particular, we evaluate them using several full-fledged morphological descriptions. Our tools compare well with similar open source tools, even if we still have some challenges ahead before we can catch up with the commercial tools. We demonstrate that for various reasons a parallel rule approach still seems to be more efficient than a cascaded rule approach when developing finite-state morphologies.
13 schema:editor Nf49d0a667e5f469e95d1bae96da5fa39
14 schema:genre chapter
15 schema:inLanguage en
16 schema:isAccessibleForFree true
17 schema:isPartOf N9f532198d89b498994143b161a282f06
18 schema:name HFST Tools for Morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers
19 schema:pagination 28-47
20 schema:productId N1e653f08aa964d9691fffbdf9711aacf
21 N33869ba2e09c49c49401c90874721cba
22 N7f481d892d8548d5b604bb73801f808d
23 schema:publisher Ne7d31c1d9a4241ec88fdf7a968c6f2c1
24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029378268
25 https://doi.org/10.1007/978-3-642-04131-0_3
26 schema:sdDatePublished 2019-04-15T20:06
27 schema:sdLicense https://scigraph.springernature.com/explorer/license/
28 schema:sdPublisher Nd1ad1f0a85464e43b3a96eb73dbf8b27
29 schema:url http://link.springer.com/10.1007/978-3-642-04131-0_3
30 sgo:license sg:explorer/license/
31 sgo:sdDataset chapters
32 rdf:type schema:Chapter
33 N18012c368ef442de99a17754bcfd2586 rdf:first sg:person.012661243541.52
34 rdf:rest N534ad0567f31425686a7d3a3c487cdb3
35 N1e653f08aa964d9691fffbdf9711aacf schema:name doi
36 schema:value 10.1007/978-3-642-04131-0_3
37 rdf:type schema:PropertyValue
38 N2a33e2076acf4397a32c5cc9f4f0c1a0 rdf:first sg:person.012142452767.27
39 rdf:rest N18012c368ef442de99a17754bcfd2586
40 N33869ba2e09c49c49401c90874721cba schema:name dimensions_id
41 schema:value pub.1029378268
42 rdf:type schema:PropertyValue
43 N534ad0567f31425686a7d3a3c487cdb3 rdf:first sg:person.015446606513.39
44 rdf:rest rdf:nil
45 N6e0e61aa0d02477e8195270e65dd87b1 schema:familyName Piotrowski
46 schema:givenName Michael
47 rdf:type schema:Person
48 N7f481d892d8548d5b604bb73801f808d schema:name readcube_id
49 schema:value 8bf41b94b6f5a68798dfd007fe22f0d348edeb6dd50d0076fd8fd001b2c0ea32
50 rdf:type schema:PropertyValue
51 N9f532198d89b498994143b161a282f06 schema:isbn 978-3-642-04130-3
52 978-3-642-04131-0
53 schema:name State of the Art in Computational Morphology
54 rdf:type schema:Book
55 Nd1ad1f0a85464e43b3a96eb73dbf8b27 schema:name Springer Nature - SN SciGraph project
56 rdf:type schema:Organization
57 Nd46245c77ea74ee3bd69bff30d65154f rdf:first N6e0e61aa0d02477e8195270e65dd87b1
58 rdf:rest rdf:nil
59 Ne7d31c1d9a4241ec88fdf7a968c6f2c1 schema:location Berlin, Heidelberg
60 schema:name Springer Berlin Heidelberg
61 rdf:type schema:Organisation
62 Nf329917b686548fead57d348d62c2808 schema:familyName Mahlow
63 schema:givenName Cerstin
64 rdf:type schema:Person
65 Nf49d0a667e5f469e95d1bae96da5fa39 rdf:first Nf329917b686548fead57d348d62c2808
66 rdf:rest Nd46245c77ea74ee3bd69bff30d65154f
67 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
68 schema:name Language, Communication and Culture
69 rdf:type schema:DefinedTerm
70 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
71 schema:name Linguistics
72 rdf:type schema:DefinedTerm
73 sg:person.012142452767.27 schema:affiliation https://www.grid.ac/institutes/grid.7737.4
74 schema:familyName Lindén
75 schema:givenName Krister
76 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012142452767.27
77 rdf:type schema:Person
78 sg:person.012661243541.52 schema:affiliation https://www.grid.ac/institutes/grid.7737.4
79 schema:familyName Silfverberg
80 schema:givenName Miikka
81 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012661243541.52
82 rdf:type schema:Person
83 sg:person.015446606513.39 schema:affiliation https://www.grid.ac/institutes/grid.7737.4
84 schema:familyName Pirinen
85 schema:givenName Tommi
86 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015446606513.39
87 rdf:type schema:Person
88 sg:pub.10.1007/11780885_38 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032990849
89 https://doi.org/10.1007/11780885_38
90 rdf:type schema:CreativeWork
91 sg:pub.10.1007/11816508_19 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039928418
92 https://doi.org/10.1007/11816508_19
93 rdf:type schema:CreativeWork
94 sg:pub.10.1007/978-3-540-76336-9_3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035845266
95 https://doi.org/10.1007/978-3-540-76336-9_3
96 rdf:type schema:CreativeWork
97 https://doi.org/10.1016/j.tcs.2004.07.007 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052889113
98 rdf:type schema:CreativeWork
99 https://doi.org/10.3115/980431.980529 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000949830
100 rdf:type schema:CreativeWork
101 https://doi.org/10.3115/991886.991957 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013623943
102 rdf:type schema:CreativeWork
103 https://www.grid.ac/institutes/grid.7737.4 schema:alternateName University of Helsinki
104 schema:name Department of General Linguistics, University of Helsinki, Finland
105 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...