Development of the compact English LVCSR acoustic model for embedded entertainment robot applications View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2007-09

AUTHORS

Xavier Menéndez-Pidal, Ajay Patrikar, Lex Olorenshaw, Hitoshi Honda

ABSTRACT

In this paper we discuss two techniques to reduce the size of the acoustic model while maintaining or improving the accuracy of the recognition engine. The first technique, demiphone modeling, tries to reduce the redundancy existing in a context dependent state-clustered Hidden Markov Model (HMM). Three-state demiphones optimally designed from the triphone decision tree are introduced to drastically reduce the phone space of the acoustic model and to improve system accuracy. The second redundancy elimination technique is a more classical approach based on parameter tying. Similar vectors of variances in each HMM cluster are tied together to reduce the number of parameters. The closeness between the vectors of variances is measured using a Vector Quantizer (VQ) to maintain the information provided by the variances parameters. The paper also reports speech recognition improvements using assignment of variable number Gaussians per cluster and gender-based HMMs. The main motivation behind these techniques is to improve the acoustic model and at the same time lower its memory usage. These techniques may help in reducing memory and improving accuracy of an embedded Large Vocabulary Continuous Speech Recognition (LVCSR) application. More... »

PAGES

63-74

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s10772-008-9012-6

DOI

http://dx.doi.org/10.1007/s10772-008-9012-6

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1012013781


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "name": [
            "R&D Laboratory, SONY Computer Entertainment of America, 919 East Hillsdale Blvd, 2nd floor, 94404, Foster City, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Men\u00e9ndez-Pidal", 
        "givenName": "Xavier", 
        "id": "sg:person.011667777227.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011667777227.17"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Former Spoken Language Technology Laboratory, SONY Electronics, 94134, San Jos\u00e9, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Patrikar", 
        "givenName": "Ajay", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Former Spoken Language Technology Laboratory, SONY Electronics, 94134, San Jos\u00e9, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Olorenshaw", 
        "givenName": "Lex", 
        "id": "sg:person.011616552136.07", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011616552136.07"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Sony (Japan)", 
          "id": "https://www.grid.ac/institutes/grid.410792.9", 
          "name": [
            "Information Technologies Laboratories, SONY Corporation, Tokyo, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Honda", 
        "givenName": "Hitoshi", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/s0167-6393(00)00010-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010129997"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1044/jshr.3406.1222", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013479079"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/1075812.1075885", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017254450"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-60087-6_3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033850426", 
          "https://doi.org/10.1007/978-3-642-60087-6_3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0167-6393(00)00049-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040374916"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0167-6393(02)00122-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044282952"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0167-6393(02)00122-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044282952"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tcom.1980.1094577", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061552708"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icassp.1993.319393", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1086349695"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icassp.2003.1198750", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094729198"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/asru.2003.1318400", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094902106"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2007-09", 
    "datePublishedReg": "2007-09-01", 
    "description": "In this paper we discuss two techniques to reduce the size of the acoustic model while maintaining or improving the accuracy of the recognition engine. The first technique, demiphone modeling, tries to reduce the redundancy existing in a context dependent state-clustered Hidden Markov Model (HMM). Three-state demiphones optimally designed from the triphone decision tree are introduced to drastically reduce the phone space of the acoustic model and to improve system accuracy. The second redundancy elimination technique is a more classical approach based on parameter tying. Similar vectors of variances in each HMM cluster are tied together to reduce the number of parameters. The closeness between the vectors of variances is measured using a Vector Quantizer (VQ) to maintain the information provided by the variances parameters. The paper also reports speech recognition improvements using assignment of variable number Gaussians per cluster and gender-based HMMs. The main motivation behind these techniques is to improve the acoustic model and at the same time lower its memory usage. These techniques may help in reducing memory and improving accuracy of an embedded Large Vocabulary Continuous Speech Recognition (LVCSR) application.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s10772-008-9012-6", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1132409", 
        "issn": [
          "1381-2416", 
          "1572-8110"
        ], 
        "name": "International Journal of Speech Technology", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "2-3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "10"
      }
    ], 
    "name": "Development of the compact English LVCSR acoustic model for embedded entertainment robot applications", 
    "pagination": "63-74", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s10772-008-9012-6"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "393d6c29f18111d8023f72578d63c49b9d8ab844e7a07ea5883cd9988b43c2ff"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1012013781"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s10772-008-9012-6", 
      "https://app.dimensions.ai/details/publication/pub.1012013781"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-15T09:22", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000376_0000000376/records_56185_00000000.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1007/s10772-008-9012-6"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10772-008-9012-6'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10772-008-9012-6'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10772-008-9012-6'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10772-008-9012-6'


 

This table displays all metadata directly associated to this object as RDF triples.

117 TRIPLES      21 PREDICATES      37 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s10772-008-9012-6 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Nd4f3b01bf8564e3eb290943627c58404
4 schema:citation sg:pub.10.1007/978-3-642-60087-6_3
5 https://doi.org/10.1016/s0167-6393(00)00010-8
6 https://doi.org/10.1016/s0167-6393(00)00049-2
7 https://doi.org/10.1016/s0167-6393(02)00122-x
8 https://doi.org/10.1044/jshr.3406.1222
9 https://doi.org/10.1109/asru.2003.1318400
10 https://doi.org/10.1109/icassp.1993.319393
11 https://doi.org/10.1109/icassp.2003.1198750
12 https://doi.org/10.1109/tcom.1980.1094577
13 https://doi.org/10.3115/1075812.1075885
14 schema:datePublished 2007-09
15 schema:datePublishedReg 2007-09-01
16 schema:description In this paper we discuss two techniques to reduce the size of the acoustic model while maintaining or improving the accuracy of the recognition engine. The first technique, demiphone modeling, tries to reduce the redundancy existing in a context dependent state-clustered Hidden Markov Model (HMM). Three-state demiphones optimally designed from the triphone decision tree are introduced to drastically reduce the phone space of the acoustic model and to improve system accuracy. The second redundancy elimination technique is a more classical approach based on parameter tying. Similar vectors of variances in each HMM cluster are tied together to reduce the number of parameters. The closeness between the vectors of variances is measured using a Vector Quantizer (VQ) to maintain the information provided by the variances parameters. The paper also reports speech recognition improvements using assignment of variable number Gaussians per cluster and gender-based HMMs. The main motivation behind these techniques is to improve the acoustic model and at the same time lower its memory usage. These techniques may help in reducing memory and improving accuracy of an embedded Large Vocabulary Continuous Speech Recognition (LVCSR) application.
17 schema:genre research_article
18 schema:inLanguage en
19 schema:isAccessibleForFree false
20 schema:isPartOf N135f5fcde4164b849efc7f396403a756
21 N5dea4ba4d1044a718a0c2732cf5b6eb8
22 sg:journal.1132409
23 schema:name Development of the compact English LVCSR acoustic model for embedded entertainment robot applications
24 schema:pagination 63-74
25 schema:productId N23b9c3cf2fa848a2a6f12e9a1673d0ec
26 Na6f5dbf9903f44c59266054bce26ff63
27 Nc9c19035c3af471fa0c56de5ded1e76a
28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012013781
29 https://doi.org/10.1007/s10772-008-9012-6
30 schema:sdDatePublished 2019-04-15T09:22
31 schema:sdLicense https://scigraph.springernature.com/explorer/license/
32 schema:sdPublisher N4e6901d448b94a86af07ab1c2089d080
33 schema:url http://link.springer.com/10.1007/s10772-008-9012-6
34 sgo:license sg:explorer/license/
35 sgo:sdDataset articles
36 rdf:type schema:ScholarlyArticle
37 N135f5fcde4164b849efc7f396403a756 schema:issueNumber 2-3
38 rdf:type schema:PublicationIssue
39 N23b9c3cf2fa848a2a6f12e9a1673d0ec schema:name readcube_id
40 schema:value 393d6c29f18111d8023f72578d63c49b9d8ab844e7a07ea5883cd9988b43c2ff
41 rdf:type schema:PropertyValue
42 N432558a589314336b5c1e4de0529fb87 schema:name Former Spoken Language Technology Laboratory, SONY Electronics, 94134, San José, CA, USA
43 rdf:type schema:Organization
44 N4acc033dd6b843bbb0de53ed4658106f rdf:first sg:person.011616552136.07
45 rdf:rest Nbab008d199f04b458a1d3d0243c952e8
46 N4e6901d448b94a86af07ab1c2089d080 schema:name Springer Nature - SN SciGraph project
47 rdf:type schema:Organization
48 N5c44854cbb8b4d01abc22d41de8bb6cd schema:name R&D Laboratory, SONY Computer Entertainment of America, 919 East Hillsdale Blvd, 2nd floor, 94404, Foster City, CA, USA
49 rdf:type schema:Organization
50 N5dea4ba4d1044a718a0c2732cf5b6eb8 schema:volumeNumber 10
51 rdf:type schema:PublicationVolume
52 Na6f5dbf9903f44c59266054bce26ff63 schema:name dimensions_id
53 schema:value pub.1012013781
54 rdf:type schema:PropertyValue
55 Nbab008d199f04b458a1d3d0243c952e8 rdf:first Nf781171f83cc4ee8b1f2ba0c75187e77
56 rdf:rest rdf:nil
57 Nbcbd8fd31e9d4f5ebb3cf8f1ee6d8594 rdf:first Nf519582aa00041e0b8fddeb5321dd263
58 rdf:rest N4acc033dd6b843bbb0de53ed4658106f
59 Nc9c19035c3af471fa0c56de5ded1e76a schema:name doi
60 schema:value 10.1007/s10772-008-9012-6
61 rdf:type schema:PropertyValue
62 Nd1c0f4fd75154a239aaf60d7c3abb519 schema:name Former Spoken Language Technology Laboratory, SONY Electronics, 94134, San José, CA, USA
63 rdf:type schema:Organization
64 Nd4f3b01bf8564e3eb290943627c58404 rdf:first sg:person.011667777227.17
65 rdf:rest Nbcbd8fd31e9d4f5ebb3cf8f1ee6d8594
66 Nf519582aa00041e0b8fddeb5321dd263 schema:affiliation Nd1c0f4fd75154a239aaf60d7c3abb519
67 schema:familyName Patrikar
68 schema:givenName Ajay
69 rdf:type schema:Person
70 Nf781171f83cc4ee8b1f2ba0c75187e77 schema:affiliation https://www.grid.ac/institutes/grid.410792.9
71 schema:familyName Honda
72 schema:givenName Hitoshi
73 rdf:type schema:Person
74 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
75 schema:name Information and Computing Sciences
76 rdf:type schema:DefinedTerm
77 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
78 schema:name Artificial Intelligence and Image Processing
79 rdf:type schema:DefinedTerm
80 sg:journal.1132409 schema:issn 1381-2416
81 1572-8110
82 schema:name International Journal of Speech Technology
83 rdf:type schema:Periodical
84 sg:person.011616552136.07 schema:affiliation N432558a589314336b5c1e4de0529fb87
85 schema:familyName Olorenshaw
86 schema:givenName Lex
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011616552136.07
88 rdf:type schema:Person
89 sg:person.011667777227.17 schema:affiliation N5c44854cbb8b4d01abc22d41de8bb6cd
90 schema:familyName Menéndez-Pidal
91 schema:givenName Xavier
92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011667777227.17
93 rdf:type schema:Person
94 sg:pub.10.1007/978-3-642-60087-6_3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033850426
95 https://doi.org/10.1007/978-3-642-60087-6_3
96 rdf:type schema:CreativeWork
97 https://doi.org/10.1016/s0167-6393(00)00010-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010129997
98 rdf:type schema:CreativeWork
99 https://doi.org/10.1016/s0167-6393(00)00049-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040374916
100 rdf:type schema:CreativeWork
101 https://doi.org/10.1016/s0167-6393(02)00122-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1044282952
102 rdf:type schema:CreativeWork
103 https://doi.org/10.1044/jshr.3406.1222 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013479079
104 rdf:type schema:CreativeWork
105 https://doi.org/10.1109/asru.2003.1318400 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094902106
106 rdf:type schema:CreativeWork
107 https://doi.org/10.1109/icassp.1993.319393 schema:sameAs https://app.dimensions.ai/details/publication/pub.1086349695
108 rdf:type schema:CreativeWork
109 https://doi.org/10.1109/icassp.2003.1198750 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094729198
110 rdf:type schema:CreativeWork
111 https://doi.org/10.1109/tcom.1980.1094577 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061552708
112 rdf:type schema:CreativeWork
113 https://doi.org/10.3115/1075812.1075885 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017254450
114 rdf:type schema:CreativeWork
115 https://www.grid.ac/institutes/grid.410792.9 schema:alternateName Sony (Japan)
116 schema:name Information Technologies Laboratories, SONY Corporation, Tokyo, Japan
117 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...