Ontology type: schema:ScholarlyArticle Open Access: True
2001-12
AUTHORSBo Thiesson, Christopher Meek, David Heckerman
ABSTRACTThe EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires significant computational resources and has been dismissed as impractical for large databases. We present two approaches that significantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality. Both approaches are based on partial E-steps for which we can use the results of Neal and Hinton (In Jordan, M. (Ed.), Learning in Graphical Models, pp. 355–371. The Netherlands: Kluwer Academic Publishers) to obtain the standard convergence guarantees of EM. The first approach is a version of the incremental EM algorithm, described in Neal and Hinton (1998), which cycles through data cases in blocks. The number of cases in each block dramatically effects the efficiency of the algorithm. We provide amethod for selecting a near optimal block size. The second approach, which we call lazy EM, will, at scheduled iterations, evaluate the significance of each data case and then proceed for several iterations actively using only the significant cases. We demonstrate that both methods can significantly reduce computational costs through their application to high-dimensional real-world and synthetic mixture modeling problems for large databases. More... »
PAGES279-299
http://scigraph.springernature.com/pub.10.1023/a:1017986506241
DOIhttp://dx.doi.org/10.1023/a:1017986506241
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1017911237
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Artificial Intelligence and Image Processing",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Microsoft (United States)",
"id": "https://www.grid.ac/institutes/grid.419815.0",
"name": [
"Microsoft Research, One Microsoft Way, 98052, Redmond, WA, USA"
],
"type": "Organization"
},
"familyName": "Thiesson",
"givenName": "Bo",
"id": "sg:person.013761575755.26",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013761575755.26"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Microsoft (United States)",
"id": "https://www.grid.ac/institutes/grid.419815.0",
"name": [
"Microsoft Research, One Microsoft Way, 98052, Redmond, WA, USA"
],
"type": "Organization"
},
"familyName": "Meek",
"givenName": "Christopher",
"id": "sg:person.01352023432.48",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01352023432.48"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Microsoft (United States)",
"id": "https://www.grid.ac/institutes/grid.419815.0",
"name": [
"Microsoft Research, One Microsoft Way, 98052, Redmond, WA, USA"
],
"type": "Organization"
},
"familyName": "Heckerman",
"givenName": "David",
"id": "sg:person.01134362461.98",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01134362461.98"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1007/978-94-011-5014-9_12",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1009494930",
"https://doi.org/10.1007/978-94-011-5014-9_12"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1162/089976600300015853",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1015553486"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/347090.347123",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1024820511"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1023/a:1007469629108",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1044802083",
"https://doi.org/10.1023/a:1007469629108"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/233269.233324",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1053685676"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1093/biomet/80.2.267",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1059420382"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/icassp.1995.479281",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1093482120"
],
"type": "CreativeWork"
}
],
"datePublished": "2001-12",
"datePublishedReg": "2001-12-01",
"description": "The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires significant computational resources and has been dismissed as impractical for large databases. We present two approaches that significantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality. Both approaches are based on partial E-steps for which we can use the results of Neal and Hinton (In Jordan, M. (Ed.), Learning in Graphical Models, pp. 355\u2013371. The Netherlands: Kluwer Academic Publishers) to obtain the standard convergence guarantees of EM. The first approach is a version of the incremental EM algorithm, described in Neal and Hinton (1998), which cycles through data cases in blocks. The number of cases in each block dramatically effects the efficiency of the algorithm. We provide amethod for selecting a near optimal block size. The second approach, which we call lazy EM, will, at scheduled iterations, evaluate the significance of each data case and then proceed for several iterations actively using only the significant cases. We demonstrate that both methods can significantly reduce computational costs through their application to high-dimensional real-world and synthetic mixture modeling problems for large databases.",
"genre": "research_article",
"id": "sg:pub.10.1023/a:1017986506241",
"inLanguage": [
"en"
],
"isAccessibleForFree": true,
"isPartOf": [
{
"id": "sg:journal.1125588",
"issn": [
"0885-6125",
"1573-0565"
],
"name": "Machine Learning",
"type": "Periodical"
},
{
"issueNumber": "3",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "45"
}
],
"name": "Accelerating EM for Large Databases",
"pagination": "279-299",
"productId": [
{
"name": "readcube_id",
"type": "PropertyValue",
"value": [
"5556fc13dfb48f3086620b936fd0d88d39848a040f2fa61f00b4c875787d04d5"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1023/a:1017986506241"
]
},
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1017911237"
]
}
],
"sameAs": [
"https://doi.org/10.1023/a:1017986506241",
"https://app.dimensions.ai/details/publication/pub.1017911237"
],
"sdDataset": "articles",
"sdDatePublished": "2019-04-11T01:58",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8700_00000504.jsonl",
"type": "ScholarlyArticle",
"url": "http://link.springer.com/10.1023/A:1017986506241"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1017986506241'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1017986506241'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1017986506241'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1017986506241'
This table displays all metadata directly associated to this object as RDF triples.
98 TRIPLES
21 PREDICATES
34 URIs
19 LITERALS
7 BLANK NODES