Class-Based Language Model Adaptation View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2006

AUTHORS

Martin C. Emele , Zica Valsan , Yin Hay Lam , Silke Goronzy

ABSTRACT

In this paper we introduce and evaluate two class-based language model adaptation techniques for adapting general n-gram-based background language models to a specific spoken dialogue task. The required background language models are derived from available newspaper corpora and Internet newsgroup collections. We followed a standard mixture-based approach for language model adaptation by generating several clusters of topic-specific language models and combined them into a specific target language model using different weights depending on the chosen application domain. In addition, we developed a novel word n-gram pruning technique for domain adaptation and proposed a new approach for thematic text clustering. This method relies on a new discriminative n-gram-based key term selection process for document clustering. These key terms are then used to automatically cluster the whole document collection. By selecting only relevant text clusters for language model training, we addressed the problem of generating task-specific language models. Different key term selection methods are investigated using perplexity as the evaluation measure. Automatically computed clusters are compared with manually labeled genre clusters, and the results provide a significant performance improvement depending on the chosen key term selection method. More... »

PAGES

109-121

Book

TITLE

SmartKom: Foundations of Multimodal Dialogue Systems

ISBN

978-3-540-23732-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-36678-4_7

DOI

http://dx.doi.org/10.1007/3-540-36678-4_7

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1040245624


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "name": [
            "Sony Corporate Laboratories Europe, Advanced Software Laboratory, Sony International (Europe] GmbH, Stuttgart, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Emele", 
        "givenName": "Martin C.", 
        "id": "sg:person.014111464603.16", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014111464603.16"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Sony Corporate Laboratories Europe, Advanced Software Laboratory, Sony International (Europe] GmbH, Stuttgart, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Valsan", 
        "givenName": "Zica", 
        "id": "sg:person.014567244510.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014567244510.17"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Sony Corporate Laboratories Europe, Advanced Software Laboratory, Sony International (Europe] GmbH, Stuttgart, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lam", 
        "givenName": "Yin Hay", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Sony Corporate Laboratories Europe, Advanced Software Laboratory, Sony International (Europe] GmbH, Stuttgart, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Goronzy", 
        "givenName": "Silke", 
        "id": "sg:person.011154245677.42", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011154245677.42"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1006/csla.2001.0174", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042402249"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/34.56193", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061156505"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/72.846729", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061219419"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/89.736328", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061242489"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2006", 
    "datePublishedReg": "2006-01-01", 
    "description": "In this paper we introduce and evaluate two class-based language model adaptation techniques for adapting general n-gram-based background language models to a specific spoken dialogue task. The required background language models are derived from available newspaper corpora and Internet newsgroup collections. We followed a standard mixture-based approach for language model adaptation by generating several clusters of topic-specific language models and combined them into a specific target language model using different weights depending on the chosen application domain. In addition, we developed a novel word n-gram pruning technique for domain adaptation and proposed a new approach for thematic text clustering. This method relies on a new discriminative n-gram-based key term selection process for document clustering. These key terms are then used to automatically cluster the whole document collection. By selecting only relevant text clusters for language model training, we addressed the problem of generating task-specific language models. Different key term selection methods are investigated using perplexity as the evaluation measure. Automatically computed clusters are compared with manually labeled genre clusters, and the results provide a significant performance improvement depending on the chosen key term selection method.", 
    "editor": [
      {
        "familyName": "Wahlster", 
        "givenName": "Wolfgang", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-36678-4_7", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-23732-7"
      ], 
      "name": "SmartKom: Foundations of Multimodal Dialogue Systems", 
      "type": "Book"
    }, 
    "name": "Class-Based Language Model Adaptation", 
    "pagination": "109-121", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-36678-4_7"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "bfb71658f8f0aa4a6e109a91bb31eda58bc669efd4f1a4a0342822d717fc8bea"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1040245624"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-36678-4_7", 
      "https://app.dimensions.ai/details/publication/pub.1040245624"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T15:23", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8672_00000268.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/3-540-36678-4_7"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-36678-4_7'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-36678-4_7'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-36678-4_7'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-36678-4_7'


 

This table displays all metadata directly associated to this object as RDF triples.

100 TRIPLES      23 PREDICATES      31 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-36678-4_7 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N13b078ef99a342ad8e730d090c12d869
4 schema:citation https://doi.org/10.1006/csla.2001.0174
5 https://doi.org/10.1109/34.56193
6 https://doi.org/10.1109/72.846729
7 https://doi.org/10.1109/89.736328
8 schema:datePublished 2006
9 schema:datePublishedReg 2006-01-01
10 schema:description In this paper we introduce and evaluate two class-based language model adaptation techniques for adapting general n-gram-based background language models to a specific spoken dialogue task. The required background language models are derived from available newspaper corpora and Internet newsgroup collections. We followed a standard mixture-based approach for language model adaptation by generating several clusters of topic-specific language models and combined them into a specific target language model using different weights depending on the chosen application domain. In addition, we developed a novel word n-gram pruning technique for domain adaptation and proposed a new approach for thematic text clustering. This method relies on a new discriminative n-gram-based key term selection process for document clustering. These key terms are then used to automatically cluster the whole document collection. By selecting only relevant text clusters for language model training, we addressed the problem of generating task-specific language models. Different key term selection methods are investigated using perplexity as the evaluation measure. Automatically computed clusters are compared with manually labeled genre clusters, and the results provide a significant performance improvement depending on the chosen key term selection method.
11 schema:editor N23fc6f5b6cd04768a2d036a9ba3af085
12 schema:genre chapter
13 schema:inLanguage en
14 schema:isAccessibleForFree false
15 schema:isPartOf N7cdf8c9e4496497e8416dcd655e6f125
16 schema:name Class-Based Language Model Adaptation
17 schema:pagination 109-121
18 schema:productId N281b68f2a3624a14bb93907e0644b90e
19 N484eb2923749437e9750139bd0ba18c3
20 Nb2edf7b22abf4934967bbdfcf5c26917
21 schema:publisher Naf0c58a38e654fb4a51348d0e7174069
22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040245624
23 https://doi.org/10.1007/3-540-36678-4_7
24 schema:sdDatePublished 2019-04-15T15:23
25 schema:sdLicense https://scigraph.springernature.com/explorer/license/
26 schema:sdPublisher N1d226a360c3a469688e1aac93a4da64a
27 schema:url http://link.springer.com/10.1007/3-540-36678-4_7
28 sgo:license sg:explorer/license/
29 sgo:sdDataset chapters
30 rdf:type schema:Chapter
31 N0b7fa77be6f94055819bf00e1816fdcc rdf:first sg:person.011154245677.42
32 rdf:rest rdf:nil
33 N13b078ef99a342ad8e730d090c12d869 rdf:first sg:person.014111464603.16
34 rdf:rest Na314679b4da0433f9580c582d5d73541
35 N1d226a360c3a469688e1aac93a4da64a schema:name Springer Nature - SN SciGraph project
36 rdf:type schema:Organization
37 N23fc6f5b6cd04768a2d036a9ba3af085 rdf:first Nb02db9e19cf94f4caae1800a2ec64ae4
38 rdf:rest rdf:nil
39 N281b68f2a3624a14bb93907e0644b90e schema:name doi
40 schema:value 10.1007/3-540-36678-4_7
41 rdf:type schema:PropertyValue
42 N41aab85aa1d84526b4cb8a07420374c4 schema:affiliation N9b62e40931f744328872524c25d738b9
43 schema:familyName Lam
44 schema:givenName Yin Hay
45 rdf:type schema:Person
46 N45dd3cb4f7b5483a8f5d2b579742bcbb schema:name Sony Corporate Laboratories Europe, Advanced Software Laboratory, Sony International (Europe] GmbH, Stuttgart, Germany
47 rdf:type schema:Organization
48 N484eb2923749437e9750139bd0ba18c3 schema:name dimensions_id
49 schema:value pub.1040245624
50 rdf:type schema:PropertyValue
51 N4c33098bd7604e24b4f4ea61d33d9016 rdf:first N41aab85aa1d84526b4cb8a07420374c4
52 rdf:rest N0b7fa77be6f94055819bf00e1816fdcc
53 N6555f957f76e4793acee947f7d65dc34 schema:name Sony Corporate Laboratories Europe, Advanced Software Laboratory, Sony International (Europe] GmbH, Stuttgart, Germany
54 rdf:type schema:Organization
55 N7cdf8c9e4496497e8416dcd655e6f125 schema:isbn 978-3-540-23732-7
56 schema:name SmartKom: Foundations of Multimodal Dialogue Systems
57 rdf:type schema:Book
58 N9b62e40931f744328872524c25d738b9 schema:name Sony Corporate Laboratories Europe, Advanced Software Laboratory, Sony International (Europe] GmbH, Stuttgart, Germany
59 rdf:type schema:Organization
60 Na314679b4da0433f9580c582d5d73541 rdf:first sg:person.014567244510.17
61 rdf:rest N4c33098bd7604e24b4f4ea61d33d9016
62 Naf0c58a38e654fb4a51348d0e7174069 schema:name Springer Berlin Heidelberg
63 rdf:type schema:Organisation
64 Nb02db9e19cf94f4caae1800a2ec64ae4 schema:familyName Wahlster
65 schema:givenName Wolfgang
66 rdf:type schema:Person
67 Nb2edf7b22abf4934967bbdfcf5c26917 schema:name readcube_id
68 schema:value bfb71658f8f0aa4a6e109a91bb31eda58bc669efd4f1a4a0342822d717fc8bea
69 rdf:type schema:PropertyValue
70 Nf36ca7b87de949ac9f94664a2c26407f schema:name Sony Corporate Laboratories Europe, Advanced Software Laboratory, Sony International (Europe] GmbH, Stuttgart, Germany
71 rdf:type schema:Organization
72 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
73 schema:name Information and Computing Sciences
74 rdf:type schema:DefinedTerm
75 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
76 schema:name Artificial Intelligence and Image Processing
77 rdf:type schema:DefinedTerm
78 sg:person.011154245677.42 schema:affiliation Nf36ca7b87de949ac9f94664a2c26407f
79 schema:familyName Goronzy
80 schema:givenName Silke
81 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011154245677.42
82 rdf:type schema:Person
83 sg:person.014111464603.16 schema:affiliation N45dd3cb4f7b5483a8f5d2b579742bcbb
84 schema:familyName Emele
85 schema:givenName Martin C.
86 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014111464603.16
87 rdf:type schema:Person
88 sg:person.014567244510.17 schema:affiliation N6555f957f76e4793acee947f7d65dc34
89 schema:familyName Valsan
90 schema:givenName Zica
91 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014567244510.17
92 rdf:type schema:Person
93 https://doi.org/10.1006/csla.2001.0174 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042402249
94 rdf:type schema:CreativeWork
95 https://doi.org/10.1109/34.56193 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061156505
96 rdf:type schema:CreativeWork
97 https://doi.org/10.1109/72.846729 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061219419
98 rdf:type schema:CreativeWork
99 https://doi.org/10.1109/89.736328 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061242489
100 rdf:type schema:CreativeWork
 




Preview window. Press ESC to close (or click here)


...