How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2007-06-05

AUTHORS

John W. Graham, Allison E. Olchowski, Tamika D. Gilreath

ABSTRACT

Multiple imputation (MI) and full information maximum likelihood (FIML) are the two most common approaches to missing data analysis. In theory, MI and FIML are equivalent when identical models are tested using the same variables, and when m, the number of imputations performed with MI, approaches infinity. However, it is important to know how many imputations are necessary before MI and FIML are sufficiently equivalent in ways that are important to prevention scientists. MI theory suggests that small values of m, even on the order of three to five imputations, yield excellent results. Previous guidelines for sufficient m are based on relative efficiency, which involves the fraction of missing information (γ) for the parameter being estimated, and m. In the present study, we used a Monte Carlo simulation to test MI models across several scenarios in which γ and m were varied. Standard errors and p-values for the regression coefficient of interest varied as a function of m, but not at the same rate as relative efficiency. Most importantly, statistical power for small effect sizes diminished as m became smaller, and the rate of this power falloff was much greater than predicted by changes in relative efficiency. Based our findings, we recommend that researchers using MI should perform many more imputations than previously considered sufficient. These recommendations are based on γ, and take into consideration one’s tolerance for a preventable power falloff (compared to FIML) due to using too few imputations. More... »

PAGES

206-213

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s11121-007-0070-9

DOI

http://dx.doi.org/10.1007/s11121-007-0070-9

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1018783636

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/17549635


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/11", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Medical and Health Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1117", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Public Health and Health Services", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Data Interpretation, Statistical", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Likelihood Functions", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Models, Statistical", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Monte Carlo Method", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Preventive Medicine", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sample Size", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., 16802, University Park, PA, USA", 
          "id": "http://www.grid.ac/institutes/grid.29857.31", 
          "name": [
            "Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., 16802, University Park, PA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Graham", 
        "givenName": "John W.", 
        "id": "sg:person.01112534025.06", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01112534025.06"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., 16802, University Park, PA, USA", 
          "id": "http://www.grid.ac/institutes/grid.29857.31", 
          "name": [
            "Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., 16802, University Park, PA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Olchowski", 
        "givenName": "Allison E.", 
        "id": "sg:person.0703715253.79", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0703715253.79"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., 16802, University Park, PA, USA", 
          "id": "http://www.grid.ac/institutes/grid.29857.31", 
          "name": [
            "Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., 16802, University Park, PA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Gilreath", 
        "givenName": "Tamika D.", 
        "id": "sg:person.0725245510.27", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0725245510.27"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2007-06-05", 
    "datePublishedReg": "2007-06-05", 
    "description": "Multiple imputation (MI) and full information maximum likelihood (FIML) are the two most common approaches to missing data analysis. In theory, MI and FIML are equivalent when identical models are tested using the same variables, and when m, the number of imputations performed with MI, approaches infinity. However, it is important to know how many imputations are necessary before MI and FIML are sufficiently equivalent in ways that are important to prevention scientists. MI theory suggests that small values of m, even on the order of three to five imputations, yield excellent results. Previous guidelines for sufficient m are based on relative efficiency, which involves the fraction of missing information (\u03b3) for the parameter being estimated, and m. In the present study, we used a Monte Carlo simulation to test MI models across several scenarios in which \u03b3 and m were varied. Standard errors and p-values for the regression coefficient of interest varied as a function of m, but not at the same rate as relative efficiency. Most importantly, statistical power for small effect sizes diminished as m became smaller, and the rate of this power falloff was much greater than predicted by changes in relative efficiency. Based our findings, we recommend that researchers using MI should perform many more imputations than previously considered sufficient. These recommendations are based on \u03b3, and take into consideration one\u2019s tolerance for a preventable power falloff (compared to FIML) due to using too few imputations.", 
    "genre": "article", 
    "id": "sg:pub.10.1007/s11121-007-0070-9", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1021703", 
        "issn": [
          "1389-4986", 
          "1573-6695"
        ], 
        "name": "Prevention Science", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "8"
      }
    ], 
    "keywords": [
      "full information maximum likelihood", 
      "power falloff", 
      "multiple imputation", 
      "Monte Carlo simulations", 
      "number of imputations", 
      "relative efficiency", 
      "Carlo simulations", 
      "maximum likelihood", 
      "small values", 
      "regression coefficients", 
      "theory", 
      "statistical power", 
      "more imputations", 
      "imputation", 
      "identical models", 
      "standard error", 
      "common approach", 
      "infinity", 
      "consideration one", 
      "data analysis", 
      "model", 
      "falloff", 
      "simulations", 
      "error", 
      "parameters", 
      "efficiency", 
      "coefficient", 
      "same variables", 
      "variables", 
      "approach", 
      "function", 
      "one", 
      "values", 
      "power", 
      "order", 
      "prevention scientists", 
      "scenarios", 
      "number", 
      "excellent results", 
      "results", 
      "interest", 
      "way", 
      "size", 
      "same rate", 
      "analysis", 
      "likelihood", 
      "information", 
      "small effect sizes", 
      "scientists", 
      "MI theory", 
      "researchers", 
      "effect size", 
      "rate", 
      "fraction", 
      "study", 
      "MI model", 
      "clarification", 
      "changes", 
      "tolerance", 
      "recommendations", 
      "guidelines", 
      "present study", 
      "findings", 
      "previous guidelines", 
      "information maximum likelihood", 
      "preventable power falloff", 
      "Practical Clarifications", 
      "Multiple Imputation Theory", 
      "Imputation Theory"
    ], 
    "name": "How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory", 
    "pagination": "206-213", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1018783636"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s11121-007-0070-9"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "17549635"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s11121-007-0070-9", 
      "https://app.dimensions.ai/details/publication/pub.1018783636"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-01-01T18:16", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/article/article_435.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1007/s11121-007-0070-9"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s11121-007-0070-9'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s11121-007-0070-9'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s11121-007-0070-9'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s11121-007-0070-9'


 

This table displays all metadata directly associated to this object as RDF triples.

173 TRIPLES      21 PREDICATES      102 URIs      94 LITERALS      14 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s11121-007-0070-9 schema:about N0249ba418b90444b8ddadc0ee824e646
2 N19a5bc0d6b5c4e72ae0269f0b9c8116d
3 N2801d7628b4d478b84e2f0b142a1d5a0
4 N8a964380963b4512b0d649ecb489a9d4
5 Nb378c5f15c414bb2af20b5332b465f5b
6 Nd3abf30b6fab4557bbd84d8dd50c7d14
7 Nf9395be8a2ca42e6bba6c3202ed546ce
8 anzsrc-for:11
9 anzsrc-for:1117
10 schema:author Ndc5e2e17be554b5198e05bfcfa5ae692
11 schema:datePublished 2007-06-05
12 schema:datePublishedReg 2007-06-05
13 schema:description Multiple imputation (MI) and full information maximum likelihood (FIML) are the two most common approaches to missing data analysis. In theory, MI and FIML are equivalent when identical models are tested using the same variables, and when m, the number of imputations performed with MI, approaches infinity. However, it is important to know how many imputations are necessary before MI and FIML are sufficiently equivalent in ways that are important to prevention scientists. MI theory suggests that small values of m, even on the order of three to five imputations, yield excellent results. Previous guidelines for sufficient m are based on relative efficiency, which involves the fraction of missing information (γ) for the parameter being estimated, and m. In the present study, we used a Monte Carlo simulation to test MI models across several scenarios in which γ and m were varied. Standard errors and p-values for the regression coefficient of interest varied as a function of m, but not at the same rate as relative efficiency. Most importantly, statistical power for small effect sizes diminished as m became smaller, and the rate of this power falloff was much greater than predicted by changes in relative efficiency. Based our findings, we recommend that researchers using MI should perform many more imputations than previously considered sufficient. These recommendations are based on γ, and take into consideration one’s tolerance for a preventable power falloff (compared to FIML) due to using too few imputations.
14 schema:genre article
15 schema:inLanguage en
16 schema:isAccessibleForFree false
17 schema:isPartOf N4bc20917cba94b1baf7f2c3d9ffef289
18 Ne10c4963e594497d9bf5bb12f5c249ac
19 sg:journal.1021703
20 schema:keywords Carlo simulations
21 Imputation Theory
22 MI model
23 MI theory
24 Monte Carlo simulations
25 Multiple Imputation Theory
26 Practical Clarifications
27 analysis
28 approach
29 changes
30 clarification
31 coefficient
32 common approach
33 consideration one
34 data analysis
35 effect size
36 efficiency
37 error
38 excellent results
39 falloff
40 findings
41 fraction
42 full information maximum likelihood
43 function
44 guidelines
45 identical models
46 imputation
47 infinity
48 information
49 information maximum likelihood
50 interest
51 likelihood
52 maximum likelihood
53 model
54 more imputations
55 multiple imputation
56 number
57 number of imputations
58 one
59 order
60 parameters
61 power
62 power falloff
63 present study
64 preventable power falloff
65 prevention scientists
66 previous guidelines
67 rate
68 recommendations
69 regression coefficients
70 relative efficiency
71 researchers
72 results
73 same rate
74 same variables
75 scenarios
76 scientists
77 simulations
78 size
79 small effect sizes
80 small values
81 standard error
82 statistical power
83 study
84 theory
85 tolerance
86 values
87 variables
88 way
89 schema:name How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory
90 schema:pagination 206-213
91 schema:productId N44423ba57b5047ccb2e3ba6243153a31
92 N8d167982cb1044798888600f3a4983d8
93 Nc371e62b079f41baa191b8dbb4549594
94 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018783636
95 https://doi.org/10.1007/s11121-007-0070-9
96 schema:sdDatePublished 2022-01-01T18:16
97 schema:sdLicense https://scigraph.springernature.com/explorer/license/
98 schema:sdPublisher Nf32452f0e13d418595d40f0eb23f11bc
99 schema:url https://doi.org/10.1007/s11121-007-0070-9
100 sgo:license sg:explorer/license/
101 sgo:sdDataset articles
102 rdf:type schema:ScholarlyArticle
103 N0249ba418b90444b8ddadc0ee824e646 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
104 schema:name Likelihood Functions
105 rdf:type schema:DefinedTerm
106 N19a5bc0d6b5c4e72ae0269f0b9c8116d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
107 schema:name Preventive Medicine
108 rdf:type schema:DefinedTerm
109 N2801d7628b4d478b84e2f0b142a1d5a0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
110 schema:name Sample Size
111 rdf:type schema:DefinedTerm
112 N44423ba57b5047ccb2e3ba6243153a31 schema:name pubmed_id
113 schema:value 17549635
114 rdf:type schema:PropertyValue
115 N4bc20917cba94b1baf7f2c3d9ffef289 schema:issueNumber 3
116 rdf:type schema:PublicationIssue
117 N64db82c553f642f693ac30ed0cc42769 rdf:first sg:person.0703715253.79
118 rdf:rest N92faf13a4b4b4389a8df8686acd70977
119 N8a964380963b4512b0d649ecb489a9d4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
120 schema:name Monte Carlo Method
121 rdf:type schema:DefinedTerm
122 N8d167982cb1044798888600f3a4983d8 schema:name doi
123 schema:value 10.1007/s11121-007-0070-9
124 rdf:type schema:PropertyValue
125 N92faf13a4b4b4389a8df8686acd70977 rdf:first sg:person.0725245510.27
126 rdf:rest rdf:nil
127 Nb378c5f15c414bb2af20b5332b465f5b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
128 schema:name Models, Statistical
129 rdf:type schema:DefinedTerm
130 Nc371e62b079f41baa191b8dbb4549594 schema:name dimensions_id
131 schema:value pub.1018783636
132 rdf:type schema:PropertyValue
133 Nd3abf30b6fab4557bbd84d8dd50c7d14 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
134 schema:name Humans
135 rdf:type schema:DefinedTerm
136 Ndc5e2e17be554b5198e05bfcfa5ae692 rdf:first sg:person.01112534025.06
137 rdf:rest N64db82c553f642f693ac30ed0cc42769
138 Ne10c4963e594497d9bf5bb12f5c249ac schema:volumeNumber 8
139 rdf:type schema:PublicationVolume
140 Nf32452f0e13d418595d40f0eb23f11bc schema:name Springer Nature - SN SciGraph project
141 rdf:type schema:Organization
142 Nf9395be8a2ca42e6bba6c3202ed546ce schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
143 schema:name Data Interpretation, Statistical
144 rdf:type schema:DefinedTerm
145 anzsrc-for:11 schema:inDefinedTermSet anzsrc-for:
146 schema:name Medical and Health Sciences
147 rdf:type schema:DefinedTerm
148 anzsrc-for:1117 schema:inDefinedTermSet anzsrc-for:
149 schema:name Public Health and Health Services
150 rdf:type schema:DefinedTerm
151 sg:journal.1021703 schema:issn 1389-4986
152 1573-6695
153 schema:name Prevention Science
154 schema:publisher Springer Nature
155 rdf:type schema:Periodical
156 sg:person.01112534025.06 schema:affiliation grid-institutes:grid.29857.31
157 schema:familyName Graham
158 schema:givenName John W.
159 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01112534025.06
160 rdf:type schema:Person
161 sg:person.0703715253.79 schema:affiliation grid-institutes:grid.29857.31
162 schema:familyName Olchowski
163 schema:givenName Allison E.
164 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0703715253.79
165 rdf:type schema:Person
166 sg:person.0725245510.27 schema:affiliation grid-institutes:grid.29857.31
167 schema:familyName Gilreath
168 schema:givenName Tamika D.
169 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0725245510.27
170 rdf:type schema:Person
171 grid-institutes:grid.29857.31 schema:alternateName Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., 16802, University Park, PA, USA
172 schema:name Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., 16802, University Park, PA, USA
173 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...