Knowledge discovery and knowledge validation in intensive care

https://doi.org/10.1016/S0933-3657(00)00047-6Get rights and content

Abstract

Operational protocols are a valuable means for quality control. However, developing operational protocols is a highly complex and costly task. We present an integrated approach involving both intelligent data analysis and knowledge acquisition from experts that support the development of operational protocols. The aim is to ensure high quality standards for the protocol through empirical validation during the development, as well as lower development cost through the use of machine learning and statistical techniques. We demonstrate our approach of integrating expert knowledge with data driven techniques based on our effort to develop an operational protocol for the hemodynamic system.

Introduction

An abundance of information is generated during the process of critical care. Much of this information can now be captured and stored using clinical information systems (CIS) that have become commercially available for use in intensive care over the last years. These systems provide for a complete medical documentation at the bedside and their clinical usefulness and efficiency has been shown repeatedly [8], [9], [13]. While databases with more than 2000 separate patient-related variables are now available for further analysis [10], the multitude of variables presented at the bedside even without a CIS precludes medical judgement by humans. A physician may be confronted with more than 200 variables in the critically ill during a typical morning round [25]. We know, however, that even an experienced physician is often not able to develop a systematic response to any problem involving more than seven variables [22]. Moreover, humans are limited in their ability to estimate the degree of relatedness between only two variables [15]. This problem is most pronounced in the evaluation of the measurable effect of a therapeutic intervention. Personal bias, experience, and a certain expectation toward the respective intervention may distort an objective judgement [6]. These arguments motivate the use of decision support systems.

Clinical decision support aims at providing health care professionals with therapy guidelines directly at the bed-side. This should enhance the quality of clinical care, since guidelines sort out high value practices from those that have little or no value. The goal of decision support is to supply the best recommendation under all circumstances [26]. The computerized protocol of care can take into account more aspects of the patient than a physician can accommodate. It is not disturbed by circumstances or hospital constraints. It bridges the gap between low-level numerical measurements (the level of the equipment) and high-level qualitative principles (the level of medical reasoning). While knowledge-based systems have mostly been applied for diagnosis and therapy planning (e.g. [19], [28]), some systems also aim at on-line patient monitoring [7], [21], [26]. Methods that have proved their value in handling low-frequency patient data are not applicable for on-line monitoring [21]. Quantitative measurements and qualitative reasoning have to be integrated in a system that recommends interventions in real-time. The numerical measurements of the patients' vital signs have to be abstracted into qualitative terms of high abstraction. The aspect of time has to be handled both at the level of measurements and the level of expert knowledge [4], [17], [21], [28]. In the expert's reasoning, time becomes the relation between time intervals, abstracting from the exact duration of, e.g. an increasing heart rate, and focusing on tendencies of other parameters (e.g. cardiac output) within overlapping time intervals.

One of the big obstacles to the more frequent implementation of decision support systems is the tedious and time-consuming task of developing the knowledge base. The decision support system for respiratory care at the LDS Hospital, Salt Lake City, USA [26], for instance, has been developed in about 25 person years. The method of guideline development itself is not supported by a computer system. Mechanisms of temporal abstraction and reasoning presuppose manually designed models or ontologies [4], [21], [28]. Why not use techniques of knowledge discovery and statistical time series analysis in order to ease the process of guideline generation? Machine learning and statistical analysis have been applied in building-up diagnostical systems successfully (e.g. [18], [20], [30], [36]). We now want to exploit the huge amount of data for the development of guidelines for on-line monitoring. Our task is to build a decision support system for online hemodynamic monitoring in the critically ill. We do not aim at modeling the actual physician’s behavior Imitating the actual interventions made by physicians is not the goal. Actual behavior is influenced by the overall hospital situation, e.g. how long is the physician on duty, how many patients require attention at the same time. Machine learning from patients’ data could lead to a knowledge base that mirrors such disturbing effects. Therefore, the learned decision rules have to be checked by additional rules about effects of drug and fluid administration. Our approach is to combine statistics, knowledge acquisition, and machine learning. Our aim is to develop a method for guideline generation that is faster and more reliable than current methods.

Data for statistical evaluation and learning can be provided by the CIS. However, the nature of the data is different from that gathered in controlled experiments. While a CIS in modern intensive care can take numerous measurements every minute, the values of some vital signs are sometimes recorded only once every hour. Other vital signs are recorded only for a subset of the patients. Hence, the overall high dimensional data space is sparsely populated. Moreover, the average time difference between intervention as charted and estimated hemodynamic effect can show a wide variation [12]. Even the automatic measurements can be noisy due to manipulation of measurement equipment, flushing of pressure transducers, or technical artifacts. In some cases, relevant demographic and diagnostic parameters may even not be recorded at all. In summary, we have a large amount of high dimensional, numerical time series data that contains missing values and noise. Using this data already at the stage of development of the decision support system stave off surprises at the stage of clinical experience as have been reported in [2], p. 5721: ‘the huge number of measurements classified as invalid is quite astonishing although it reflects the real clinical environments’.

In addition to problems of knowledge acquisition, we see a particular need for knowledge validation. It should be noted that many medical guidelines published today are neither evidence-based nor sufficiently validated against real patient data. The current procedure is to first develop the guideline, then represent it in a knowledge-based system, and finally to test it in clinical studies. In this ‘waterfall’ process, unrealistic assumptions, mistakes, and flaws are recognized at a late stage. In contrast, our approach includes validation from the very beginning. Using a knowledge-based system early on supports the validation of the knowledge base at earlier stages. Inconsistencies within the knowledge base as well as a mismatch of rules and patient data are detected while developing the knowledge base. This facilitates and focuses the knowledge-acquisition process.

In order to test our approach to using real clinical data for building and validating a knowledge base for online monitoring, we have constructed a system. Its overall architecture is shown in Fig. 1. The patients’ measurements are used to recommend an intervention and are abstracted with respect to their course over time. The recommendation of interventions constitutes a model of physician behavior. This asks for further validation. Therefore, a recommended intervention is checked by calculating its expected effects on the basis of medical knowledge. In this way, a qualitative assessment of a statistical prediction enhances the model of physician behavior in order to obtain a model of best practice. The medical knowledge constitutes a model of the patients' hemodynamic system. This model is validated with respect to past patients' data. The processes we have implemented are outlined in the following sections.

Given series of measurements of one vital sign of the patient, data abstraction detects and possibly eliminates outliers and finds level changes by good statistical practice. This abstracts the measurements to qualitative propositions with respect to a time interval, e.g. within time point 12 and time point 63 the heart rate remained about equal, from time point 63 to time point 69 it was increasing. Our approach is based on statistical time series analysis. Classical autoregressive moving average (ARMA) modelling [3] is applied with corresponding outlier- and level shift detection procedures using the new tool of a phase space embedding. The statistical method for data abstraction is described in Section 3.

Given the numerical data describing vital signs of the patient and his or her current medication, the task of acquiring state-action rules is to find the appropriate intervention. An intervention is formalized as increasing, decreasing or not changing the dose of a drug. The decision is made every minute. These rules were learned by the support vector machine [37]. Section 4 shows how we applied the support vector machine to learn state-action rules.

Given text book knowledge and explanations by an expert, the task of acquiring medical knowledge is to represent the effects of substances in different dosages, relations between vital signs, and interrelations between different substances, and validate the knowledge on the basis of past patients’ data. The knowledge acquisition and validation was supported by the MOBAL system [24]. Section 5 gives a short introduction to the system and its representation of medical knowledge. It is stressed how MOBAL is capable of checking the knowledge base against patient data.

Interventions — especially those recommended by a learned rule — are to be validated by the action-effect of rules of the knowledge base. This validation task is:

Given

  • the state of a patient described in qualitative terms;

  • medical knowledge;

  • a sequence of interventions, and

  • a recommended intervention,

find the effects of the current intervention on the patient. The derivation of effects is made for each intervention as forward inference within MOBAL. The effect should result in a stable state of the patient. The validation is detailed in Section 6.

The outline of this paper is as follows. Throughout the paper we report on the continuous development of a decision support system for intensive care as performed at the City Hospital and the University of Dortmund. We start with a description of the data acquisition process at the hospital and the resulting data set [13] and then report on each of the processes which we have developed. A statistical method for data abstraction is described in Section 3. The next section (Section 4) shows, how we applied the support vector machine (SVM) to learn state-action rules. A short introduction to the MOBAL system [24] and its representation of medical knowledge leads to the issue of validation which is presented in Section 6.

Section snippets

Data acquisition

Most variables are entered by hand at the bedside. For entities such as clinical observations, nursing procedures, therapeutic measures, medications, or orders it appears very unlikely that entry of these variables can be automated in the foreseeable future. Only 5–10% of all variables in a CIS are acquired automatically. This includes the majority of bedside devices, e.g. physiologic monitors, ventilators, infusion devices. Additional data is interfaced from the hospital information system

Statistical analysis of time series

Time series analysis was employed for data abstraction of the time oriented variables with the goal of detecting outliers and level changes. The classical and widely used statistical approach to modelling time series is so called ARMA modelling [3] which assumes that a time series (xt)t=1, 2,… can be written asxt1xt−1+pxtp1εt−1+qεtqtwhere εt is an unobservable shock at time t. This assumption means that each observation is a linear combination of past observations and past shocks

Support vector machine (SVM)

Using the support vector machine, we have analyzed patient data in order to acquire a model of actual physicians’ behavior. This section introduces support vector machines and describes learning from patient data. Section 6 describes how learning results are supplemented by qualitative and prescriptive medical knowledge. SVMs [37] represent a method to learn binary classifiers from examples. For a set of training examples (o1,y1),…,(on,yn) they find the classification rule h for which they

Medical knowledge base

Decision rules learned by the SVM reflect the average behavior of a physician, not the ‘gold standard’. As argued above, they have to be checked against medical knowledge about the effects of drugs. This section presents an approach to building a knowledge base that helps accomplish this task automatically and that makes decision support transparent.

Knowledge acquisition from experts is performed according to the current state of the art: first, knowledge is elicited from the expert, second, a

Using the knowledge base of effects to validate interventions

Medical knowledge is used for validation in two different ways. On the one hand, learned decision rules are validated on patient data by comparing the effects of their recommended interventions with the effects of actual physicians’ interventions. This validation means to incorporate an evaluation step already into the knowledge acquisition phase. On the other hand, we believe that even an evaluated decision support system should check its decisions by considering their effects.

Comparison with related work

Using data from the most comprehensive singular clinical data repository at the LDS Hospital, Salt Lake City, Utah, USA, the group of Morris [26] developed a rule-based decision support system (DSS) for respiratory care in acute respiratory distress syndrome. Time is handled by introducing time points into the rules where a certain parameter value needs to be obtained. The development of this highly specialized system required more than 25 person years. It is a propositional rule base without a

Conclusions

We presented an approach towards integrating statistical and knowledge-based methods for the development of decision support algorithms in critical care. This application involves high dimensional time series data, demanding high quality decision support under real time constraints. These properties make this case study a representative for a large number of applications in medicine and engineering.

This paper gives the necessary steps for solving this task as a whole. We identified how the

Acknowledgements

This work has been funded in part by the collaborative research center ‘Complexity Reduction in Multivariate Data Structures’ (DFG, SFB475). We thank the reviewers for valuable comments.

References (37)

  • G.E.P Box et al.

    Time Series Analysis: Forecasting and Control

    (1994)
  • Dojat M, Sayettat C. A realistic model for temporal reasoning in real-time patient monitoring, Appl. Artif. Intell....
  • A.J Fox

    Outliers in time series

    J. R. Stat. Soc. B

    (1972)
  • M Imhoff

    A clinical information system on the intensive care unit: dream or night mare?

  • M Imhoff

    Three years clinical use of the Siemens Etntek System 2000: efforts and benefits

    Clin. Intensive Care (Suppl.)

    (1996)
  • M Imhoff

    Clinical data acquisition: what and how?

    J. Andsthesie Intensivmed.

    (1998)
  • M Imhoff et al.

    Statistical pattern detection in univariate time series of intensive care on-line monitoring data

    Intensive Care Med.

    (1998)
  • M Imhoff et al.

    Time-effect relations of medical interventions in a clinical information system

  • Cited by (50)

    • The comparison of selected machine learning techniques and correlation matrix in ICU mortality risk prediction

      2022, Informatics in Medicine Unlocked
      Citation Excerpt :

      Many points are also overlooked, including daily physiological values, active treatment, medications, procedures performed, etc. However, these predictions are not accurate enough for patients were individual aspects, and no tool can reliably predict the patient's progress in timely critical care conditions [25,35–39]. In this study, we used twelve predictor algorithms for data analysis, including: decision tree, nearest neighbor, support vector machine, random forest, logistic regression of Gradient boosting, deep learning and MLP, RBF, Naive Bayes, Rule Model, and Fast Large Margin; finally, the Gradient boosting algorithm with (AUC = 0.8) showed better performance in predicting patient mortality in the intensive care unit.

    • Analysis of heart rate variability as a predictor of mortality in cardiovascular patients of intensive care unit

      2015, Biocybernetics and Biomedical Engineering
      Citation Excerpt :

      However, these predictions are not accurate enough and still there is not any reliable tool to predict the dynamics leads to death in ICU patients. Some research used machine learning algorithms, such as artificial neural networks and decision trees as a prediction algorithm in different critical care settings [22–28]. However, the evaluation of their performance is still under discussion.

    • Predictive combinations of monitor alarms preceding in-hospital code blue events

      2012, Journal of Biomedical Informatics
      Citation Excerpt :

      Reducing the false positive rate beyond reducing the number of false alarms is more challenging because of the need for highly sensitive monitoring in an acute care setting. Advanced pattern recognition of biomedical signals has also been advocated as a method to create intelligent alarms that are hopefully more specific without sacrificing sensitivity [20–24]. However, these studies are in research phase and there is a demand to create annotated databases to evaluate different intelligent alarm algorithms [25].

    • Smart alarms from medical devices in the OR and ICU

      2009, Best Practice and Research: Clinical Anaesthesiology
    View all citing articles on Scopus
    1

    Tel.: +49-231-5020021; fax: +49-231-5020081.

    2

    Tel.: +49-231-7552416; fax: +49-231-7555105.

    3

    Tel.: +49-231-7555102; fax: +49-231-7555105.

    4

    Tel: +49-231-7553110; fax: +49-231-7555305.

    View full text