Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in ... View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2021-08-09

AUTHORS

Alexandre Vimont, Henri Leleu, Isabelle Durand-Zaleski

ABSTRACT

BackgroundInnovative provider payment methods that avoid adverse selection and reward performance require accurate prediction of healthcare costs based on individual risk adjustment. Our objective was to compare the performances of a simple neural network (NN) and random forest (RF) to a generalized linear model (GLM) for the prediction of medical cost at the individual level.MethodsA 1/97 representative sample of the French National Health Data Information System was used. Predictors selected were: demographic information; pre-existing conditions, Charlson comorbidity index; healthcare service use and costs. Predictive performances of each model were compared through individual-level (adjusted R-squared (adj-R2), mean absolute error (MAE) and hit ratio (HiR)), and distribution-level metrics on different sets of covariates in the general population and by pre-existing morbid condition, using a quasi-Monte Carlo design.ResultsWe included 510,182 subjects alive on 31st December, 2015. Mean annual costs were 1894€ (standard deviation 9326€) (median 393€, IQ range 95€; 1480€), including zero-claim subjects. All models performed similarly after adjustment on demographics. RF model had better performances on other sets of covariates (pre-existing conditions, resource counts and past year costs). On full model, RF reached an adj-R2 of 47.5%, a MAE of 1338€ and a HiR of 67%, while GLM and NN had an adj-R2 of 34.7% and 31.6%, a MAE of 1635€ and 1660€, and a HiR of 58% and 55 M, respectively. RF model outperformed GLM and NN for most conditions and for high-cost subjects.ConclusionsRF should be preferred when the objective is to best predict medical costs. When the objective is to understand the contribution of predictors, GLM was well suited with demographics, conditions and base year cost. More... »

PAGES

211-223

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s10198-021-01363-4

DOI

http://dx.doi.org/10.1007/s10198-021-01363-4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1140292586

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/34373958


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/11", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Medical and Health Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/14", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Economics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1117", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Public Health and Health Services", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1402", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Applied Economics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Factual", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Health Care Costs", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Health Services", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Linear Models", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Machine Learning", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Assistance Publique H\u00f4pitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France", 
          "id": "http://www.grid.ac/institutes/grid.50550.35", 
          "name": [
            "Public Health Expertise (PHE), Paris, France", 
            "Assistance Publique H\u00f4pitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Vimont", 
        "givenName": "Alexandre", 
        "id": "sg:person.010341560013.34", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010341560013.34"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Public Health Expertise (PHE), Paris, France", 
          "id": "http://www.grid.ac/institutes/grid.457361.2", 
          "name": [
            "Public Health Expertise (PHE), Paris, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Leleu", 
        "givenName": "Henri", 
        "id": "sg:person.0770635067.93", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770635067.93"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Assistance Publique H\u00f4pitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France", 
          "id": "http://www.grid.ac/institutes/grid.50550.35", 
          "name": [
            "Assistance Publique H\u00f4pitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Durand-Zaleski", 
        "givenName": "Isabelle", 
        "id": "sg:person.01327533357.46", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01327533357.46"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1186/s12938-018-0568-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1110062742", 
          "https://doi.org/10.1186/s12938-018-0568-3"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s10198-017-0873-y", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1083765760", 
          "https://doi.org/10.1007/s10198-017-0873-y"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2021-08-09", 
    "datePublishedReg": "2021-08-09", 
    "description": "BackgroundInnovative provider payment methods that avoid adverse selection and reward performance require accurate prediction of healthcare costs based on individual risk adjustment. Our objective was to compare the performances of a simple neural network (NN) and random forest (RF) to a generalized linear model (GLM) for the prediction of medical cost at the individual level.MethodsA 1/97 representative sample of the French National Health Data Information System was used. Predictors selected were: demographic information; pre-existing conditions, Charlson comorbidity index; healthcare service use and costs. Predictive performances of each model were compared through individual-level (adjusted R-squared (adj-R2), mean absolute error (MAE) and hit ratio (HiR)), and distribution-level metrics on different sets of covariates in the general population and by pre-existing morbid condition, using a quasi-Monte Carlo design.ResultsWe included 510,182 subjects alive on 31st December, 2015. Mean annual costs were 1894\u20ac (standard deviation 9326\u20ac) (median 393\u20ac, IQ range 95\u20ac; 1480\u20ac), including zero-claim subjects. All models performed similarly after adjustment on demographics. RF model had better performances on other sets of covariates (pre-existing conditions, resource counts and past year costs). On full model, RF reached an adj-R2 of 47.5%, a MAE of 1338\u20ac and a HiR of 67%, while GLM and NN had an adj-R2 of 34.7% and 31.6%, a MAE of 1635\u20ac and 1660\u20ac, and a HiR of 58% and 55\u00a0M, respectively. RF model outperformed GLM and NN for most conditions and for high-cost subjects.ConclusionsRF should be preferred when the objective is to best predict medical costs. When the objective is to understand the contribution of predictors, GLM was well suited with demographics, conditions and base year cost.", 
    "genre": "article", 
    "id": "sg:pub.10.1007/s10198-021-01363-4", 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1297512", 
        "issn": [
          "1439-3972", 
          "1439-6637"
        ], 
        "name": "The European Journal of Health Economics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "2", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "23"
      }
    ], 
    "keywords": [
      "generalized linear model", 
      "Monte Carlo design", 
      "set of covariates", 
      "neural network", 
      "contribution of predictors", 
      "simple neural network", 
      "individual healthcare costs", 
      "random forest", 
      "linear model", 
      "full model", 
      "base-year costs", 
      "RF model", 
      "accurate prediction", 
      "predictive performance", 
      "model", 
      "better performance", 
      "set", 
      "different sets", 
      "prediction", 
      "modelling", 
      "performance", 
      "MAE", 
      "regression modelling", 
      "covariates", 
      "conditions", 
      "metrics", 
      "cost", 
      "network", 
      "ConclusionsRF", 
      "system", 
      "data information system", 
      "machine", 
      "objective", 
      "design", 
      "most conditions", 
      "selection", 
      "contribution", 
      "annual cost", 
      "information", 
      "information systems", 
      "adverse selection", 
      "samples", 
      "adjustment", 
      "use", 
      "index", 
      "provider payment methods", 
      "HIR", 
      "risk adjustment", 
      "subjects", 
      "payment methods", 
      "individual level", 
      "predictors", 
      "levels", 
      "representative sample", 
      "database", 
      "forest", 
      "population", 
      "year costs", 
      "nationwide claims database", 
      "France", 
      "method", 
      "healthcare service use", 
      "healthcare costs", 
      "demographic information", 
      "medical costs", 
      "claims database", 
      "service use", 
      "morbid conditions", 
      "ResultsWe", 
      "pre-existing conditions", 
      "general population", 
      "demographics", 
      "mean annual cost", 
      "Charlson Comorbidity Index", 
      "comorbidity index"
    ], 
    "name": "Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France", 
    "pagination": "211-223", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1140292586"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s10198-021-01363-4"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "34373958"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s10198-021-01363-4", 
      "https://app.dimensions.ai/details/publication/pub.1140292586"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-10-01T06:48", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/article/article_901.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1007/s10198-021-01363-4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10198-021-01363-4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10198-021-01363-4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10198-021-01363-4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10198-021-01363-4'


 

This table displays all metadata directly associated to this object as RDF triples.

194 TRIPLES      21 PREDICATES      110 URIs      98 LITERALS      13 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s10198-021-01363-4 schema:about N1fe25cf754b2423cae126ebc8a638df3
2 N24495ddbac844695a1def49a9377e753
3 N3c915900d49346aea02ea1b010f06d00
4 Nc9ed3bc28fae4021aa4e6a34ee1234be
5 Ncfe4b0b974234fb78cbeeea48fb96340
6 Nd0640a7c287f4ad499e2f951b55f6c11
7 anzsrc-for:11
8 anzsrc-for:1117
9 anzsrc-for:14
10 anzsrc-for:1402
11 schema:author N35725cb20735465b8a3e1d04dbdccd8f
12 schema:citation sg:pub.10.1007/s10198-017-0873-y
13 sg:pub.10.1186/s12938-018-0568-3
14 schema:datePublished 2021-08-09
15 schema:datePublishedReg 2021-08-09
16 schema:description BackgroundInnovative provider payment methods that avoid adverse selection and reward performance require accurate prediction of healthcare costs based on individual risk adjustment. Our objective was to compare the performances of a simple neural network (NN) and random forest (RF) to a generalized linear model (GLM) for the prediction of medical cost at the individual level.MethodsA 1/97 representative sample of the French National Health Data Information System was used. Predictors selected were: demographic information; pre-existing conditions, Charlson comorbidity index; healthcare service use and costs. Predictive performances of each model were compared through individual-level (adjusted R-squared (adj-R2), mean absolute error (MAE) and hit ratio (HiR)), and distribution-level metrics on different sets of covariates in the general population and by pre-existing morbid condition, using a quasi-Monte Carlo design.ResultsWe included 510,182 subjects alive on 31st December, 2015. Mean annual costs were 1894€ (standard deviation 9326€) (median 393€, IQ range 95€; 1480€), including zero-claim subjects. All models performed similarly after adjustment on demographics. RF model had better performances on other sets of covariates (pre-existing conditions, resource counts and past year costs). On full model, RF reached an adj-R2 of 47.5%, a MAE of 1338€ and a HiR of 67%, while GLM and NN had an adj-R2 of 34.7% and 31.6%, a MAE of 1635€ and 1660€, and a HiR of 58% and 55 M, respectively. RF model outperformed GLM and NN for most conditions and for high-cost subjects.ConclusionsRF should be preferred when the objective is to best predict medical costs. When the objective is to understand the contribution of predictors, GLM was well suited with demographics, conditions and base year cost.
17 schema:genre article
18 schema:isAccessibleForFree false
19 schema:isPartOf N567f5e93b63b4c3ba994dec929b0a3f5
20 Ne9eef703b8814e2c8cb9043dd76ffa85
21 sg:journal.1297512
22 schema:keywords Charlson Comorbidity Index
23 ConclusionsRF
24 France
25 HIR
26 MAE
27 Monte Carlo design
28 RF model
29 ResultsWe
30 accurate prediction
31 adjustment
32 adverse selection
33 annual cost
34 base-year costs
35 better performance
36 claims database
37 comorbidity index
38 conditions
39 contribution
40 contribution of predictors
41 cost
42 covariates
43 data information system
44 database
45 demographic information
46 demographics
47 design
48 different sets
49 forest
50 full model
51 general population
52 generalized linear model
53 healthcare costs
54 healthcare service use
55 index
56 individual healthcare costs
57 individual level
58 information
59 information systems
60 levels
61 linear model
62 machine
63 mean annual cost
64 medical costs
65 method
66 metrics
67 model
68 modelling
69 morbid conditions
70 most conditions
71 nationwide claims database
72 network
73 neural network
74 objective
75 payment methods
76 performance
77 population
78 pre-existing conditions
79 prediction
80 predictive performance
81 predictors
82 provider payment methods
83 random forest
84 regression modelling
85 representative sample
86 risk adjustment
87 samples
88 selection
89 service use
90 set
91 set of covariates
92 simple neural network
93 subjects
94 system
95 use
96 year costs
97 schema:name Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France
98 schema:pagination 211-223
99 schema:productId N2df1bac669924c59beb014da980200d0
100 N63cb1d6278a5436dadf1dc169e3a7e1e
101 Ncfe31ba4198e42e5927eeeb45a380280
102 schema:sameAs https://app.dimensions.ai/details/publication/pub.1140292586
103 https://doi.org/10.1007/s10198-021-01363-4
104 schema:sdDatePublished 2022-10-01T06:48
105 schema:sdLicense https://scigraph.springernature.com/explorer/license/
106 schema:sdPublisher N45324c73dc5c4c2da0c89a0bb935b2f9
107 schema:url https://doi.org/10.1007/s10198-021-01363-4
108 sgo:license sg:explorer/license/
109 sgo:sdDataset articles
110 rdf:type schema:ScholarlyArticle
111 N1fe25cf754b2423cae126ebc8a638df3 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
112 schema:name Health Care Costs
113 rdf:type schema:DefinedTerm
114 N24495ddbac844695a1def49a9377e753 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
115 schema:name Health Services
116 rdf:type schema:DefinedTerm
117 N2df1bac669924c59beb014da980200d0 schema:name pubmed_id
118 schema:value 34373958
119 rdf:type schema:PropertyValue
120 N35725cb20735465b8a3e1d04dbdccd8f rdf:first sg:person.010341560013.34
121 rdf:rest N864ca26b7f8443bfaa1e0b9ee6b82761
122 N3c915900d49346aea02ea1b010f06d00 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
123 schema:name Machine Learning
124 rdf:type schema:DefinedTerm
125 N45324c73dc5c4c2da0c89a0bb935b2f9 schema:name Springer Nature - SN SciGraph project
126 rdf:type schema:Organization
127 N567f5e93b63b4c3ba994dec929b0a3f5 schema:volumeNumber 23
128 rdf:type schema:PublicationVolume
129 N6321259b778a41c1be6d4e2780e897ee rdf:first sg:person.01327533357.46
130 rdf:rest rdf:nil
131 N63cb1d6278a5436dadf1dc169e3a7e1e schema:name dimensions_id
132 schema:value pub.1140292586
133 rdf:type schema:PropertyValue
134 N864ca26b7f8443bfaa1e0b9ee6b82761 rdf:first sg:person.0770635067.93
135 rdf:rest N6321259b778a41c1be6d4e2780e897ee
136 Nc9ed3bc28fae4021aa4e6a34ee1234be schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
137 schema:name Linear Models
138 rdf:type schema:DefinedTerm
139 Ncfe31ba4198e42e5927eeeb45a380280 schema:name doi
140 schema:value 10.1007/s10198-021-01363-4
141 rdf:type schema:PropertyValue
142 Ncfe4b0b974234fb78cbeeea48fb96340 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
143 schema:name Databases, Factual
144 rdf:type schema:DefinedTerm
145 Nd0640a7c287f4ad499e2f951b55f6c11 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
146 schema:name Humans
147 rdf:type schema:DefinedTerm
148 Ne9eef703b8814e2c8cb9043dd76ffa85 schema:issueNumber 2
149 rdf:type schema:PublicationIssue
150 anzsrc-for:11 schema:inDefinedTermSet anzsrc-for:
151 schema:name Medical and Health Sciences
152 rdf:type schema:DefinedTerm
153 anzsrc-for:1117 schema:inDefinedTermSet anzsrc-for:
154 schema:name Public Health and Health Services
155 rdf:type schema:DefinedTerm
156 anzsrc-for:14 schema:inDefinedTermSet anzsrc-for:
157 schema:name Economics
158 rdf:type schema:DefinedTerm
159 anzsrc-for:1402 schema:inDefinedTermSet anzsrc-for:
160 schema:name Applied Economics
161 rdf:type schema:DefinedTerm
162 sg:journal.1297512 schema:issn 1439-3972
163 1439-6637
164 schema:name The European Journal of Health Economics
165 schema:publisher Springer Nature
166 rdf:type schema:Periodical
167 sg:person.010341560013.34 schema:affiliation grid-institutes:grid.50550.35
168 schema:familyName Vimont
169 schema:givenName Alexandre
170 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010341560013.34
171 rdf:type schema:Person
172 sg:person.01327533357.46 schema:affiliation grid-institutes:grid.50550.35
173 schema:familyName Durand-Zaleski
174 schema:givenName Isabelle
175 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01327533357.46
176 rdf:type schema:Person
177 sg:person.0770635067.93 schema:affiliation grid-institutes:grid.457361.2
178 schema:familyName Leleu
179 schema:givenName Henri
180 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770635067.93
181 rdf:type schema:Person
182 sg:pub.10.1007/s10198-017-0873-y schema:sameAs https://app.dimensions.ai/details/publication/pub.1083765760
183 https://doi.org/10.1007/s10198-017-0873-y
184 rdf:type schema:CreativeWork
185 sg:pub.10.1186/s12938-018-0568-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1110062742
186 https://doi.org/10.1186/s12938-018-0568-3
187 rdf:type schema:CreativeWork
188 grid-institutes:grid.457361.2 schema:alternateName Public Health Expertise (PHE), Paris, France
189 schema:name Public Health Expertise (PHE), Paris, France
190 rdf:type schema:Organization
191 grid-institutes:grid.50550.35 schema:alternateName Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France
192 schema:name Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153, Paris, France
193 Public Health Expertise (PHE), Paris, France
194 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...