Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2010

AUTHORS

Hansjörg Hofmann , Sakriani Sakti , Ryosuke Isotani , Hisashi Kawai , Satoshi Nakamura , Wolfgang Minker

ABSTRACT

Previous approaches to spontaneous speech recognition address the multiple pronunciation problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence are not considered yet. In this paper we attempt to model the sequence-based pronunciation variation using a noisy-channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this preliminary study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy-channel approach will map from the phoneme to the word level. Our experiments use Switchboard as spontaneous speech corpus. The results show that the proposed method improves the word accuracy consistently over the conventional recognition system. The best system achieves up to 38.9% relative improvement to the baseline speech recognition. More... »

PAGES

156-162

Book

TITLE

Spoken Dialogue Systems for Ambient Environments

ISBN

978-3-642-16201-5
978-3-642-16202-2

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-16202-2_15

DOI

http://dx.doi.org/10.1007/978-3-642-16202-2_15

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1019352511


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology and Cognitive Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1702", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Cognitive Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Ulm, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6582.9", 
          "name": [
            "National Institute of Information and Communications Technology, Japan", 
            "University of Ulm, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hofmann", 
        "givenName": "Hansj\u00f6rg", 
        "id": "sg:person.012652041467.12", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012652041467.12"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Institute of Information and Communications Technology, Japan", 
          "id": "http://www.grid.ac/institutes/grid.28312.3a", 
          "name": [
            "National Institute of Information and Communications Technology, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Sakti", 
        "givenName": "Sakriani", 
        "id": "sg:person.016361461676.33", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016361461676.33"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Institute of Information and Communications Technology, Japan", 
          "id": "http://www.grid.ac/institutes/grid.28312.3a", 
          "name": [
            "National Institute of Information and Communications Technology, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Isotani", 
        "givenName": "Ryosuke", 
        "id": "sg:person.016672306663.73", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016672306663.73"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Institute of Information and Communications Technology, Japan", 
          "id": "http://www.grid.ac/institutes/grid.28312.3a", 
          "name": [
            "National Institute of Information and Communications Technology, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kawai", 
        "givenName": "Hisashi", 
        "id": "sg:person.013737440203.78", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013737440203.78"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Institute of Information and Communications Technology, Japan", 
          "id": "http://www.grid.ac/institutes/grid.28312.3a", 
          "name": [
            "National Institute of Information and Communications Technology, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nakamura", 
        "givenName": "Satoshi", 
        "id": "sg:person.014433216155.64", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014433216155.64"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Ulm, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6582.9", 
          "name": [
            "University of Ulm, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Minker", 
        "givenName": "Wolfgang", 
        "id": "sg:person.013704564607.67", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013704564607.67"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2010", 
    "datePublishedReg": "2010-01-01", 
    "description": "Previous approaches to spontaneous speech recognition address the multiple pronunciation problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence are not considered yet. In this paper we attempt to model the sequence-based pronunciation variation using a noisy-channel approach where the spontaneous phoneme sequence is considered as a \u201cnoisy\u201d string and the goal is to recover the \u201cclean\u201d string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this preliminary study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy-channel approach will map from the phoneme to the word level. Our experiments use Switchboard as spontaneous speech corpus. The results show that the proposed method improves the word accuracy consistently over the conventional recognition system. The best system achieves up to 38.9% relative improvement to the baseline speech recognition.", 
    "editor": [
      {
        "familyName": "Lee", 
        "givenName": "Gary Geunbae", 
        "type": "Person"
      }, 
      {
        "familyName": "Mariani", 
        "givenName": "Joseph", 
        "type": "Person"
      }, 
      {
        "familyName": "Minker", 
        "givenName": "Wolfgang", 
        "type": "Person"
      }, 
      {
        "familyName": "Nakamura", 
        "givenName": "Satoshi", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-16202-2_15", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-642-16201-5", 
        "978-3-642-16202-2"
      ], 
      "name": "Spoken Dialogue Systems for Ambient Environments", 
      "type": "Book"
    }, 
    "keywords": [
      "noisy-channel approach", 
      "word sequences", 
      "recognition system", 
      "conventional recognition system", 
      "phoneme level", 
      "phoneme sequences", 
      "pronunciation variation", 
      "spontaneous speech", 
      "word accuracy", 
      "phonemes", 
      "whole sentence", 
      "word level", 
      "pronunciation modeling", 
      "pronunciation problems", 
      "speech recognition", 
      "phoneme transformation", 
      "pronunciation", 
      "transformation effect", 
      "relative improvement", 
      "speech", 
      "words", 
      "sentences", 
      "recognition", 
      "preliminary study", 
      "strings", 
      "goal", 
      "switchboard", 
      "approach", 
      "alternation", 
      "effect", 
      "levels", 
      "best system", 
      "previous approaches", 
      "problem", 
      "study", 
      "modeling", 
      "model", 
      "improvement", 
      "consideration", 
      "experiments", 
      "results", 
      "accuracy", 
      "address", 
      "system", 
      "mapping", 
      "sequence", 
      "paper", 
      "hereby", 
      "alterations", 
      "variation", 
      "method", 
      "transformation", 
      "variation model", 
      "present recognition systems", 
      "whole word sequence", 
      "baseline speech recognition", 
      "spontaneous speech recognition address", 
      "speech recognition address", 
      "recognition address", 
      "multiple pronunciation problem", 
      "phonetic transformation effects", 
      "sequence-based pronunciation variation", 
      "spontaneous phoneme sequence", 
      "pronunciation variation model", 
      "Sequence-Based Pronunciation Modeling"
    ], 
    "name": "Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach", 
    "pagination": "156-162", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1019352511"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-16202-2_15"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-16202-2_15", 
      "https://app.dimensions.ai/details/publication/pub.1019352511"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2021-11-01T18:47", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/chapter/chapter_145.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-16202-2_15"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-16202-2_15'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-16202-2_15'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-16202-2_15'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-16202-2_15'


 

This table displays all metadata directly associated to this object as RDF triples.

179 TRIPLES      23 PREDICATES      91 URIs      84 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-16202-2_15 schema:about anzsrc-for:17
2 anzsrc-for:1702
3 schema:author Nbb620934b5774dea9f87dd57e119b976
4 schema:datePublished 2010
5 schema:datePublishedReg 2010-01-01
6 schema:description Previous approaches to spontaneous speech recognition address the multiple pronunciation problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence are not considered yet. In this paper we attempt to model the sequence-based pronunciation variation using a noisy-channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this preliminary study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy-channel approach will map from the phoneme to the word level. Our experiments use Switchboard as spontaneous speech corpus. The results show that the proposed method improves the word accuracy consistently over the conventional recognition system. The best system achieves up to 38.9% relative improvement to the baseline speech recognition.
7 schema:editor N0d25d2dcf96346a8ac8cbb98159a60d9
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf N2f7127bf6dac4b9588a117f80008db74
12 schema:keywords Sequence-Based Pronunciation Modeling
13 accuracy
14 address
15 alterations
16 alternation
17 approach
18 baseline speech recognition
19 best system
20 consideration
21 conventional recognition system
22 effect
23 experiments
24 goal
25 hereby
26 improvement
27 levels
28 mapping
29 method
30 model
31 modeling
32 multiple pronunciation problem
33 noisy-channel approach
34 paper
35 phoneme level
36 phoneme sequences
37 phoneme transformation
38 phonemes
39 phonetic transformation effects
40 preliminary study
41 present recognition systems
42 previous approaches
43 problem
44 pronunciation
45 pronunciation modeling
46 pronunciation problems
47 pronunciation variation
48 pronunciation variation model
49 recognition
50 recognition address
51 recognition system
52 relative improvement
53 results
54 sentences
55 sequence
56 sequence-based pronunciation variation
57 speech
58 speech recognition
59 speech recognition address
60 spontaneous phoneme sequence
61 spontaneous speech
62 spontaneous speech recognition address
63 strings
64 study
65 switchboard
66 system
67 transformation
68 transformation effect
69 variation
70 variation model
71 whole sentence
72 whole word sequence
73 word accuracy
74 word level
75 word sequences
76 words
77 schema:name Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach
78 schema:pagination 156-162
79 schema:productId N08fde8bd11514287b7eff18b111bf098
80 N53ccf12e92a34ec8ac733101d94d55d3
81 schema:publisher N76baa2d108764b06b5fcc5613a28431d
82 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019352511
83 https://doi.org/10.1007/978-3-642-16202-2_15
84 schema:sdDatePublished 2021-11-01T18:47
85 schema:sdLicense https://scigraph.springernature.com/explorer/license/
86 schema:sdPublisher N407328de3be047e1b0fd758c2f4e72e5
87 schema:url https://doi.org/10.1007/978-3-642-16202-2_15
88 sgo:license sg:explorer/license/
89 sgo:sdDataset chapters
90 rdf:type schema:Chapter
91 N08fde8bd11514287b7eff18b111bf098 schema:name doi
92 schema:value 10.1007/978-3-642-16202-2_15
93 rdf:type schema:PropertyValue
94 N0d25d2dcf96346a8ac8cbb98159a60d9 rdf:first N5cefb20bd86c491b92e8a3077f6fa38a
95 rdf:rest N126613a4e42b413ab43293bab3d9c2d0
96 N126613a4e42b413ab43293bab3d9c2d0 rdf:first Ne6df4bc2c65445fd9304c5099aadf476
97 rdf:rest Nd891269f2d114c12ad5b8fc75b35a274
98 N1308d94f167c4da690589339ff06cfc5 schema:familyName Minker
99 schema:givenName Wolfgang
100 rdf:type schema:Person
101 N2f7127bf6dac4b9588a117f80008db74 schema:isbn 978-3-642-16201-5
102 978-3-642-16202-2
103 schema:name Spoken Dialogue Systems for Ambient Environments
104 rdf:type schema:Book
105 N407328de3be047e1b0fd758c2f4e72e5 schema:name Springer Nature - SN SciGraph project
106 rdf:type schema:Organization
107 N53ccf12e92a34ec8ac733101d94d55d3 schema:name dimensions_id
108 schema:value pub.1019352511
109 rdf:type schema:PropertyValue
110 N5cefb20bd86c491b92e8a3077f6fa38a schema:familyName Lee
111 schema:givenName Gary Geunbae
112 rdf:type schema:Person
113 N666b12479b5341fc980fb032b8bf7e87 rdf:first sg:person.013737440203.78
114 rdf:rest Ncc84135131d7490aba1940f9cb0f22b0
115 N76baa2d108764b06b5fcc5613a28431d schema:name Springer Nature
116 rdf:type schema:Organisation
117 N8e7ad2e1601645df85cbd10a564ee5f1 rdf:first Nb347a4e28d904b9aa4c106dea4f747e0
118 rdf:rest rdf:nil
119 Nab80e68c9926454187d90404e74aa23b rdf:first sg:person.016361461676.33
120 rdf:rest Ncdf7cf0e41684057b40faf0bc4662610
121 Nb347a4e28d904b9aa4c106dea4f747e0 schema:familyName Nakamura
122 schema:givenName Satoshi
123 rdf:type schema:Person
124 Nbb620934b5774dea9f87dd57e119b976 rdf:first sg:person.012652041467.12
125 rdf:rest Nab80e68c9926454187d90404e74aa23b
126 Ncc84135131d7490aba1940f9cb0f22b0 rdf:first sg:person.014433216155.64
127 rdf:rest Nfa04e42eec924b8d8ae6e97bcf57ab27
128 Ncdf7cf0e41684057b40faf0bc4662610 rdf:first sg:person.016672306663.73
129 rdf:rest N666b12479b5341fc980fb032b8bf7e87
130 Nd891269f2d114c12ad5b8fc75b35a274 rdf:first N1308d94f167c4da690589339ff06cfc5
131 rdf:rest N8e7ad2e1601645df85cbd10a564ee5f1
132 Ne6df4bc2c65445fd9304c5099aadf476 schema:familyName Mariani
133 schema:givenName Joseph
134 rdf:type schema:Person
135 Nfa04e42eec924b8d8ae6e97bcf57ab27 rdf:first sg:person.013704564607.67
136 rdf:rest rdf:nil
137 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
138 schema:name Psychology and Cognitive Sciences
139 rdf:type schema:DefinedTerm
140 anzsrc-for:1702 schema:inDefinedTermSet anzsrc-for:
141 schema:name Cognitive Sciences
142 rdf:type schema:DefinedTerm
143 sg:person.012652041467.12 schema:affiliation grid-institutes:grid.6582.9
144 schema:familyName Hofmann
145 schema:givenName Hansjörg
146 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012652041467.12
147 rdf:type schema:Person
148 sg:person.013704564607.67 schema:affiliation grid-institutes:grid.6582.9
149 schema:familyName Minker
150 schema:givenName Wolfgang
151 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013704564607.67
152 rdf:type schema:Person
153 sg:person.013737440203.78 schema:affiliation grid-institutes:grid.28312.3a
154 schema:familyName Kawai
155 schema:givenName Hisashi
156 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013737440203.78
157 rdf:type schema:Person
158 sg:person.014433216155.64 schema:affiliation grid-institutes:grid.28312.3a
159 schema:familyName Nakamura
160 schema:givenName Satoshi
161 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014433216155.64
162 rdf:type schema:Person
163 sg:person.016361461676.33 schema:affiliation grid-institutes:grid.28312.3a
164 schema:familyName Sakti
165 schema:givenName Sakriani
166 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016361461676.33
167 rdf:type schema:Person
168 sg:person.016672306663.73 schema:affiliation grid-institutes:grid.28312.3a
169 schema:familyName Isotani
170 schema:givenName Ryosuke
171 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016672306663.73
172 rdf:type schema:Person
173 grid-institutes:grid.28312.3a schema:alternateName National Institute of Information and Communications Technology, Japan
174 schema:name National Institute of Information and Communications Technology, Japan
175 rdf:type schema:Organization
176 grid-institutes:grid.6582.9 schema:alternateName University of Ulm, Germany
177 schema:name National Institute of Information and Communications Technology, Japan
178 University of Ulm, Germany
179 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...