Title: Moving to next generation healthcare: Using real world claims data to target prevention of diabetes complications
Funding Body: Federal Ministry of Education and Research (BMBF); German Center for Diabetes Research (DZD) 3.0
Funding Period: 2021-2022
Partners: DAK-Gesundheit
Objectives:
- The objective of this project is to develop and validate prediction algorithms for the onset of diabetes complications in patients with type 2 diabetes based on health insurance claims using both traditional regression and state-of-the-art machine learning methods while comparing their predictive performance.
Background:
- Diabetes complications such as myocardial infarction (MI) and stroke are often preventable. It has been suggested that tailored prevention measures targeting high-risk individuals may be more effective and efficient than less targeted approaches. Statutory health insurance (SHI) claims constitute readily available data that could potentially be used to identify these high-risk patients. In such high-dimensional data, machine learning methods might provide an advantage over traditional regression methods in identifying predictive data patterns.
Methods:
- We use retrospective, high-dimensional (ICD 10 codes, procedure codes, medications, disease management program participation etc.) claims data of more than 300,000 patients with type 2 diabetes for the period 2014-2019 and identify potentially relevant predictors for MI and stroke based on a review of the literature. Subsequently we apply logistic regression, regularization methods and state-of the-art machine learning techniques (e.g. Random Forest, Gradient Boosting) to develop algorithms that identify patients at high risk of MI or stroke. Model performance is evaluated based on the area under the precision-recall curve (AUPRC). Additionally, we report metrics such as positive and negative predictive value, sensitivity, specificity, number needed to evaluate and alert rate.
Expected impact:
- The main result of our research project will be prediction models, that can help to identify high risk diabetes patients in a cost-effective manner. The performance of the more advanced statistical approaches will be benchmarked against the performance of the traditional regression methods, offering insights in applications and pitfalls of using machine learning methods in secondary data.
Further information: https://www.dzd-ev.de/dzd-next/index.html
Contact: Dr. Anna-Janina Stephan
Project publications:
Stephan AJ, Hanselmann M, Laxy M. Moving to next generation healthcare: Using real world claims data to target prevention of macrovascular complications in diabetes patients (MNGHC) - Multivariable prediction model development and validation plan for prediction of stroke and myocardial infarction. 2022. https://osf.io/v7rfu