Articles
Machine learning for real-time prediction of complications in critical care: a retrospective study

https://doi.org/10.1016/S2213-2600(18)30300-XGet rights and content

Summary

Background

The large amount of clinical signals in intensive care units can easily overwhelm health-care personnel and can lead to treatment delays, suboptimal care, or clinical errors. The aim of this study was to apply deep machine learning methods to predict severe complications during critical care in real time after cardiothoracic surgery.

Methods

We used deep learning methods (recurrent neural networks) to predict several severe complications (mortality, renal failure with a need for renal replacement therapy, and postoperative bleeding leading to operative revision) in post cardiosurgical care in real time. Adult patients who underwent major open heart surgery from Jan 1, 2000, to Dec 31, 2016, in a German tertiary care centre for cardiovascular diseases formed the main derivation dataset. We measured the accuracy and timeliness of the deep learning model's forecasts and compared predictive quality to that of established standard-of-care clinical reference tools (clinical rule for postoperative bleeding, Simplified Acute Physiology Score II for mortality, and the Kidney Disease: Improving Global Outcomes staging criteria for acute renal failure) using positive predictive value (PPV), negative predictive value, sensitivity, specificity, area under the curve (AUC), and the F1 measure (which computes a harmonic mean of sensitivity and PPV). Results were externally retrospectively validated with 5898 cases from the published MIMIC-III dataset.

Findings

Of 47 559 intensive care admissions (corresponding to 42 007 patients), we included 11 492 (corresponding to 9269 patients). The deep learning models yielded accurate predictions with the following PPV and sensitivity scores: PPV 0·90 and sensitivity 0·85 for mortality, 0·87 and 0·94 for renal failure, and 0·84 and 0·74 for bleeding. The predictions significantly outperformed the standard clinical reference tools, improving the absolute complication prediction AUC by 0·29 (95% CI 0·23–0·35) for bleeding, by 0·24 (0·19–0·29) for mortality, and by 0·24 (0·13–0·35) for renal failure (p<0·0001 for all three analyses). The deep learning methods showed accurate predictions immediately after patient admission to the intensive care unit. We also observed an increase in performance in our validation cohort when the machine learning approach was tested against clinical reference tools, with absolute improvements in AUC of 0·09 (95% CI 0·03–0·15; p=0·0026) for bleeding, of 0·18 (0·07–0·29; p=0·0013) for mortality, and of 0·25 (0·18–0·32; p<0·0001) for renal failure.

Interpretation

The observed improvements in prediction for all three investigated clinical outcomes have the potential to improve critical care. These findings are noteworthy in that they use routinely collected clinical data exclusively, without the need for any manual processing. The deep machine learning method showed AUC scores that significantly surpass those of clinical reference tools, especially soon after admission. Taken together, these properties are encouraging for prospective deployment in critical care settings to direct the staff's attention towards patients who are most at risk.

Funding

No specific funding.

Introduction

Machine learning is the study and development of systems that can learn from and make predictions on data without the need to be explicitly programmed, and is particularly useful in settings where signals and data are produced at a faster rate than the human brain can interpret. Intensive care treatment is highly challenging for care teams and generates massive amounts of data, and is therefore an optimal target for applying machine learning techniques with the goal of supporting clinical decision making.

Despite frequent reviews and editorials concerning the potential revolutionary impact of machine learning in medicine,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 translation to practical solutions for critical care patients' benefit is non-existent. Translating machine learning approaches to clinical practice is challenging for several reasons. First, some machine learning methods, such as reinforcement learning,12 require prospective interaction with patients. In the early learning stages, this could mean a dramatically increased risk of adverse events. Second, data recording in electronic health record (EHR) systems is designed and optimised for reporting, liability, and billing purposes rather than informing clinical intelligence systems.3 Third, data are often organised and stored across a variety of systems, requiring integration and harmonisation before being used in automated reasoning. Finally, patient data recorded in clinical information systems (such as vitals monitoring, laboratory values, and medications) are prone to having missing values, heterogeneity, errors, and artifacts, potentially introducing significant levels of noise to the decision process.

Research in context

Evidence before this study

Artificial intelligence-augmented care is an emerging field. Consequently, the existing literature is relatively sparse. We searched MEDLINE and arXiv for the term ((“real-time prediction”) OR (“deep learning”) OR (“real-time scoring”) OR (“machine learning”) OR (“artificial intelligence”)) AND ((intensive OR critical) care) with no language restrictions or date limitations. We retrieved 510 MEDLINE results and 252 arXiv results, 72 of which were relevant original studies. The relevant prior evidence included 18 articles investigating real-time prediction approaches. None of these articles used a deep learning methodology. Most of the articles described the prediction of sepsis and mortality, using often curated or open datasets such as the MIMIC-III dataset. All studies described a specific approach predicting a single outcome. At the time of writing, prediction of sepsis in real time is the topic with most available evidence.

Added value of this study

We developed deep learning models to predict severe complications following cardiothoracic surgery. These models used uncurated clinical datasets to predict three endpoints. By contrast with standard clinical risk scores, our approach was not based on the average patient but used cohort data to inform predictions. This approach yields higher accuracy for each individual patient. The selected clinical variables reflect the range of routinely collected information at intensive care units for all postoperative patients, removing the need for any additional manual data collection or annotation. The deep learning methods we implemented achieved superior predictive power and timeliness compared with three standard-of-care baselines.

Implications of all the available evidence

A real-time complication prediction system based on deep learning outperforms the selected standard-of-care baselines in timeliness and accuracy, even when acting on a real, uncurated data stream. We are currently deploying our system in our intensive care unit and will do a trial to confirm the results prospectively to enable its use in the clinical routine.

Existing clinical applications remain largely academic in nature and model patient outcomes such as mortality on the basis of synthetic, manually curated, or heavily distorted datasets13, 14, 15 that often do not reflect the whole dimension of signals and complexity faced in modern critical care environments.

In this work, we investigate the use of deep learning techniques in postoperative cardiac surgery care in a real-world setting. In a retrospective study including uncurated intensive care cases, we assess the merit of a predictive machine learning approach to increase quality of care and patient safety. We support our findings with a retrospective validation study on intensive care cases from another intensive care unit.

Section snippets

Datasets

We analysed electronic health record data from a German tertiary care centre for cardiovascular diseases (German Heart Center Berlin) of adult patients (≥18 years of age at the time of surgery) who underwent major open heart surgery from Jan 1, 2000, until Dec 31, 2016. We included coronary artery bypass grafting, valve surgery, aortic surgery, assist device surgery, pericardial surgery, and heart and lung transplantations. All patients who had catheter-based interventions were excluded except

Results

The complete dataset comprised 47 559 intensive care cases, corresponding to 42 007 patients, with information available on 52 patient features (Table 1, Table 2). In total, we included 11 492 admissions, which corresponded to 9269 patients (table 3).

Overall, when considering the performance scores in the balanced test dataset (table 4), postoperative bleeding seems to be more difficult to predict by either method than mortality or renal failure, for which all compared methods—including

Discussion

A real-time diagnostic and prognostic prediction model based on a machine learning algorithm and routinely collected clinical data during critical care was established and validated. The deep learning models incorporated static and dynamic variables and scrutinised their changes over time. We noted in each modelled outcome a high predictive performance (AUC ≥ 87 for all models) that is not commonly observed in current clinical prognostic models. The proposed approach has several advantages over

References (36)

  • C Krittanawong et al.

    Artificial intelligence in precision cardiovascular medicine

    J Am Coll Cardiol

    (2017)
  • E Topol

    Digital medicine: empowering both patients and clinicians

    Lancet

    (2016)
  • SM Pastores et al.

    Costs of critical care medicine

    Crit Care Clin

    (2012)
  • JH Chen et al.

    Machine learning and prediction in medicine—beyond the peak of inflated expectations

    N Engl J Med

    (2017)
  • AEW Johnson et al.

    Machine learning and decision support in critical care

    Proc IEEE

    (2016)
  • Z Obermeyer et al.

    Predicting the future—big data, machine learning, and clinical medicine

    N Engl J Med

    (2016)
  • Z Obermeyer et al.

    Lost in thought—the limits of the human mind and the future of medicine

    N Engl J Med

    (2017)
  • J McKenna

    Big data: big promise

    Eur Heart J

    (2017)
  • LA Celi et al.

    ‘Big data’ in the intensive care unit. Closing the data loop

    Am J Respir Crit Care Med

    (2013)
  • DM Maslove et al.

    A path to precision in the ICU

    Crit Care

    (2017)
  • Artificial intelligence in health care: within touching distance

    Lancet

    (2017)
  • A Verghese et al.

    What this computer needs is a physician: humanism and artificial intelligence

    JAMA

    (2017)
  • RS Sutton et al.

    Reinforcement learning: an introduction

    Trends Cogn Sci

    (1998)
  • P Grnarova et al.

    Neural document embeddings for intensive care patient mortality prediction

    arXiv

    (2016)
  • Z Che et al.

    Recurrent neural networks for multivariate time series with missing values

    arXiv

    (2016)
  • ZC Lipton et al.

    Modeling missing data in clinical time series with RNNs

    arXiv

    (2016)
  • J Chung et al.

    Empirical evaluation of gated recurrent neural networks on sequence modeling

    arXiv

    (2014)
  • RM Bojar

    Manual of perioperative care in adult cardiac surgery

    (2011)
  • Cited by (231)

    View all citing articles on Scopus
    View full text