Journal Information
Download PDF
More article options
Original article
Full text access
Available online 14 June 2024
Sepsis mortality prediction with Machine Learning Tecniques
Predicción de la mortalidad por sepsis con técnicas de aprendizaje automático
Javier Carrillo Pérez-Tomea, Tesifón Parrón-Carreñoa, Ana Belen Castaño-Fernándezb, Bruno José Nievas-Sorianoa, Gracia Castro-Lunaa,
Corresponding author

Corresponding author.
a Department of Nursing: Physiotherapy and Medicine, University of Almeria, 04120 Almeria, Spain
b Department of Applied Mathematics, University of Almería, 04120 Almeria, Spain
This item has received
Article information
Full Text
Download PDF
Figures (4)
Show moreShow less
Tables (2)
Table 1. Descriptive statistics of quantitative variables according to exitus.
Table 2. Descriptive Statistics MIMIC III database.
Show moreShow less

To develop a sepsis death classification model based on machine learning techniques for patients admitted to the Intensive Care Unit (ICU).


Cross-sectional descriptive study.


The Intensive Care Units (ICUs) of three Hospitals from Murcia (Spain) and patients from the MIMIC III open-access database.


180 patients diagnosed with sepsis in the ICUs of three hospitals and a total of 4559 patients from the MIMIC III database.

Main variables of interest

Age, weight, heart rate, respiratory rate, temperature, lactate levels, partial oxygen saturation, systolic and diastolic blood pressure, pH, urine, and potassium levels.


A random forest classification model was calculated using the local and MIMIC III databases. The sensitivity of the model of our database, considering all the variables classified as important by the random forest, was 95.45%, the specificity was 100%, the accuracy was 96.77%, and an AUC of 95%. . In the case of the model based on the MIMIC III database, the sensitivity was 97.55%, the specificity was 100%, and the precision was 98.28%, with an AUC of 97.3%.


According to random forest classification in both databases, lactate levels, urine output and variables related to acid.base equilibrium were the most important variable in mortality due to sepsis in the ICU. The potassium levels were more critical in the MIMIC III database than the local database.

Machine learning
Sepsis mortality

Desarrollar un modelo de clasificación basado en técnicas de machine-learning de muerte por sepsis para pacientes ingresados en la Unidad de Cuidados Intensivos (UCI).


Estudio descriptivo transversal.


Unidades de Cuidados Intensivos (UCI) de tres hospitales de Murcia (España) y pacientes con sepsis-3 de la base de datos de acceso abierto MIMIC III.


180 pacientes diagnosticados de sepsis en las UCI de tres hospitales y un total de 4559 pacientes con la base de datos MIMIC III.

Variables de interés principales

Se evaluaron la edad, el peso, la frecuencia cardiaca, la frecuencia respiratoria, la temperatura, los niveles de lactato, la saturación parcial de oxígeno, la presión arterial sistólica y diastólica, el pH, los niveles de orina y los niveles de potasio.


Se calcularon un modelo de clasificación de bosque aleatorio con la base de datos local y la base de datos MIMIC III. La sensibilidad del modelo de nuestra base de datos teniendo en cuenta todas las variables catalogadas como importantes por el random forest fue del 95,45%%, la especificidad del 100% y la exactitud del 96,77% y un AUC del 95%. En el caso del modelo sobre la base de datos MIMIC III la sensibilidad fue del 97,55%, la especificidad del 100% y la exactitud del 98,28% con un AUC del 97,3%.


Según la clasificación de bosque aleatorio en ambas bases de datos, los niveles de lactato, la diuresis y las variables relacionadas con el equilibrio ácido-base fueron las variables más importantes para determinar las muertes por sepsis en la UCI. Los niveles medios de potasio fueron más críticos en la base de datos MIMIC III que en las locales.

Palabras clave:
Aprendizaje automático
Mortalidad por sepsis
Full Text

The concept of sepsis began to be defined in 1992, when the first consensus on sepsis, Sepsis-1, was published. The concept gave rise to the Systemic Inflammatory Response (SIRS), defining sepsis as a “systemic inflammatory response associated with a disease.” In this consensus, levels of severity were also added: severe sepsis and septic shock. In 2001, modifications were made: a group of experts met and called this meeting Sepsis-2. Then, some values ​used for diagnosis were adjusted; however, there were no significant changes in the definition of sepsis. A working group met again in 2016 and published an update on sepsis called Sepsis-3, which determined the most current definition. It was published by The Sepsis Definitions Working Group and defined sepsis as “a life-threatening organ dysfunction caused by a dysregulated host response to infection.” In addition to the definition, it included tools to help diagnose sepsis.1,2

Concerning the loss of human life, sepsis entails a high economic cost for healthcare systems. In the United States, one-third of patients diagnosed with sepsis die at a cost of about $20,3 million a year. In Spain, the incidence is 100 cases per 100,000 people/year, and mortality is also between 20% and 43%. The estimated average cost is about $20,000 for each episode of severe sepsis.3–5 A considerable drawback in managing sepsis is the great difficulty in reaching a diagnosis.There is no specific test to establish the diagnosis, and the symptoms present very heterogeneously, making it more challenging to determine the onset of the disease and, consequently, start treatment as soon as possible. There is evidence that early diagnosis of sepsis and, therefore, Early initiation of treatment significantly reduces morbidity and mortality.

Initially, during the onset of sepsis, it is difficult to find symptoms or parameters that help us diagnose it. When we find easily recognizable signs, the disease is usually in an advanced stage, which entails more complex treatment and a worse prognosis. The efforts of many researchers are focused on developing tools that allow early detection of sepsis and its optimal management. Most hospitals have tools that attempt to detect and predict the onset of sepsis and its complications. The most used currently are the Modified Early Warning Score (MEWS) and its different versions, Systemic Inflammatory Response Syndrome (SIRS), the Sequential Organ Failure Assessment (SOFA), and its faster variant qSOFA.5–7

The introduction of electronic medical records in most hospitals makes it easier to access and use patient data (since) it is collected in a structured way and can be accessed quickly. It favors the development and implementation of prediction and decision-making systems. These tools facilitate processing the large amount of data we face when making decisions when treating patients. These tools aim to improve patient outcomes by facilitating early diagnosis and treatment decision-making.3,7

Traditional methods (MEWS, SOFA, SIRS, among others) are being widely questioned; Numerous studies defend Artificial Intelligence (AI) programs. Implement algorithms that predict sepsis, obtaining results with greater sensitivity and specificity than traditional methods.

Machine learning and Big Data (BD) are techniques expected to displace traditional methods and assist in research that cannot be performed with patients due to ethical limitations or other reasons.8,9

Some algorithms' lack of transparency and the staff's poor knowledge of their operation also generate resistance to their application, creating distrust and rejection of the system.10,11

Our objective was to develop a machine learning model of ICU mortality due to sepsis in a local and in an international population as MIMIC III open database and to evaluate the performance of each one


Design: Cross-sectional descriptive study.

Setting: It was carried out in the Intensive Care Units (ICUs) of the Virgen de la Arrixaca University Hospital, Santa Lucía Hospital, and Los Arcos Hospital. The MIMIC III database.

Study population: 180 patients diagnosed with sepsis, 56 of whom died in the ICU. All patients who met the hospital's inclusion and exclusion criteria during 2022−23 were selected. Informed consent was obtained from the patients and/or family members for the anonymous use of the data. This study was carried out by the principles of the Declaration of Helsinki and was registered with the hospital's research committee. IRB Declaration Code 646 Health Area1Arrixaca. Murcian Health Service. Murcia Spain. Authorization date 07/02/2022.

  • 1

    Database obtained from local hospital

Inclusion criteria:

  • -

    Over 18 years.

  • -

    ICD-10 (International Classification of Diseases): sepsis, severe sepsis, or septic shock.

  • -

    Computerized clinical history.

Exclusion criteria:

  • -

    Under 18 years

  • -

    Patients admitted for reasons other than the CDI above.

  • -

    Patients were readmitted to the ICU within < 24 h.

Variables from the hospital database obtained during the first 24 h:


Exitus: Death of the patient diagnosed with sepsis in the Intensive Care Unit. Dichotomous qualitative variable Yes (1)/No (0)

Sex: The patient's sex is reflected in the medical history. The qualitative, nominal, discrete, independent variable is male (0) or female (1).

  • -

    Hypertension: Reflected in the medical history in the medical evaluation upon admission. Independent dichotomous qualitative variable: Yes (0), No (1).

  • -

    Diabetes: Reflected in the medical history in the medical evaluation upon admission. Independent dichotomous qualitative variable: Yes (0), No (1).

  • -

    Consumption of vasoactive drugs: Existence of a record of vasoactive drug consumption in the electronic medical record. Variable, qualitative, dichotomous, dependent: Yes (0), No (1).


Heart rate (heart rate): The number of beats per minute is automatically uploaded to the electronic medical record every hour. It is quantitative, discrete, and dependent.

Respiratory rate (Resprate): The number of breaths per minute is automatically uploaded to the electronic medical record every hour. It is quantitative, discrete, and dependent.

Systolic blood pressure (SBP): Expressed in mmHG, it is captured by the blood pressure monitor automatically or manually with a cuff or arterial line and automatically uploaded to the electronic medical record every hour. It is quantitative, discrete, and dependent.

  • -

    Diastolic blood pressure (DBP): Expressed in mmHG, captured by the blood pressure monitor automatically or manually with a cuff or arterial line, automatically uploaded to the electronic medical record every hour. Quantitative, discrete, dependent.

Oxygen saturation (SpO2): This is expressed as a percentage of the saturation captured by the monitor with hourly finger pulse oximetry. It is quantitative, discrete, and dependent.

Temperature: Taken manually by a clinical assistant with a tympanic meter at least every hour, expressed with two whole numbers and one decimal. It is quantitative, continuous, and dependent.

Arterial or venous lactate level: The lactate figure obtained after arterial or venous blood gases with variable frequency are analyzed in the units available in the ICU and expressed in millimoles per liter (mmo/l). It is quantitative, continuous, and dependent.

Potassium levels (K): Potassium levels obtained after performing arterial or venous blood gases with variable frequency, analyzed in the units available in the ICU, and expressed in mEq/l. They are quantitative, continuous, and dependent.

  • -

    arterial or venous pH: The pH level figure obtained after performing arterial or venous blood gases with variable frequency, analyzed in the units available in the ICU. It is quantitative, continuous, and dependent.

Body weight: Obtained from the electronic medical record (EHR), expressed in kilograms (kg). It is quantitative, continuous, and independent.

  • -

    Age: Figure obtained from the EHR, expressed in years completed. Quantitative, discreet, independent.

The hospital database was automatically extracted by dumping by the research staff of the centers into a cell format of the IntelliSpace Critical Care & Anesthesia (ICCA) healthcare software.

  • 2

    MIMIC III database

The MIMIC III database was extracted from Physionet:

The total database initially consisted of 80 variables and 4559 patients. A correlation study was performed, and strongly correlated variables were eliminated. The reduced database contained 31 variables and 4559 patients. The objective was to find a classification model based on exitus (hospital_expire_flag) A total of 31 variables were collected from the MIMIC III database: categorical sex, ethnicity, metastatic_cancer, diabetes, quantitative; age, hospital_elixhauser, vent, couch, sirs, qsofa, aniongap_medium, bocarbonate_medium, creatinine_medium, glucose_medium, hemoglobin_medium, lactate_medium, platelet_means, potassium_means, inr_means, sodium_means, wbc_means, heartrate_means, sysbp_means, diasbp_means, resprate_means, tempc_means, spo2_media s, urine output, sepsis, hospital_expire_flag.

Statistical analysis in both databases was performed using SPSS software for Windows (version 25.0, SPSS, Chicago, Illinois, USA) and R (version 3.5.1). A bivariate analysis was performed, and the normality of the variables was checked using the Kolmogorov-Smirnoff test. The non-parametric Wilcoxon rank sum test (Mann-Whitney test) was used for two samples. Random forest-type classification models were implemented. The ROC curve calculated the AUC (Area Under the Curve), and the confusion matrix (actual vs predicted group) estimated the classification models' accuracy, precision, sensitivity, and specificity.


In the database collected at the hospital, the study population was 180 patients, of which 42.85% of the 110 male patients and 46.80% of the 70 female patients died from sepsis. There were no statistically significant differences between death and ICU admission days. And the total days of stay in the ICU (until death or discharge to the ward). A summary of the descriptive statistics of our database is shown in Table 1.

Table 1.

Descriptive statistics of quantitative variables according to exitus.

Variables    mean  sd  p25%  p50%  p75%  p-value 
Age  Live  64,40  15,94  56,00  67,00  76,00  0,65 
  Dead  65,35  11,28  60,00  67,00  72,50   
Breathrate  Live  21,64  5,13  18,26  19,84  25,03  0,62 
  Dead  21,24  4,16  17,91  20,85  24,48   
DBP  Live  58,49  13,54  51,67  58,83  65,38  0,81 
  Dead  57,91  8,58  51,34  56,14  63,88   
Heartrate  Live  92,86  17,97  81,16  93,82  104,79  0,01* 
  Dead  100,96  18,18  89,19  97,12  115,46   
Lactate  Live  2,33  2,02  1,23  1,90  2,83  < 0,001* 
  Dead  4,71  3,88  1,78  3,33  6,59   
Ph  Live  7,35  0,06  7,31  7,36  7,38  < 0,001* 
  Dead  7,24  0,11  7,17  7,25  7,31   
Potasium  Live  4,20  0,67  3,68  4,14  4,60  0,48 
  Dead  4,27  0,72  3,81  4,16  4,71   
SatO2  Live  96,42  2,31  95,21  96,89  97,71  0,01* 
  Dead  93,89  7,22  93,32  94,94  96,64   
SBP  Live  111,28  21,42  101,79  113,21  122,12  0,05* 
  Dead  103,17  19,07  93,33  100,61  108,68   
Temp  Live  36,44  0,63  36,03  36,41  36,86  < 0,001* 
  Dead  36,11  0,73  35,64  36,10  36,67   
Urine  Live  80,50  41,65  48,33  76,82  108,64  < 0,001* 
  Dead  49,13  50,17  7,93  33,61  72,73   

p-value < 0,05 Age: years, SpO2:%, SBP-DBP: mmHg, Lactate: mmol/l, Potassium: mEq/l. All variables correspond to averages during ICU stay.

Regarding the qualitative variables from the local database, no statistically significant relationship was found between sex and exitus, hypertension and exitus (p-value = 0,098), and diabetes and exitus (p-value = 0,138).

In the case of the qualitative variables collected in the MIMIC database the following variables were statistically significant: mechanic ventilation-exitus (p-value < -2,2e-16), renal Replacement therapy-exitus (p-value = 0,00075), metastatic cancer (p-value = 3.02e-10) and blood culture positive (p-value = 7.528e-06).Diabetes-exitus variable was non-statistically significant (p-value = 0,7735)

The model was validated by dividing the local database into a training database containing 80% of the data on which a random forest was computed and a test database containing the remaining 20% of the data on which this model was applied. Fig. 1 shows the random forest with the variables that have been important in determining death in patients with sepsis.

Figure 1.

Random Forests from local database.


The most critical variables in our database have been the average lactate level, diuresis(urine output), pH, and systolic pressure.

The model has an accuracy of 98% (0.89−0.99), a sensitivity of 97% and a specificity of 100%. The ROC curve and the AUC (0,97(0,91–1)) are shown in the Fig. 2.

Figure 2.

ROC curve of the random forest corresponding to the hospital database.


The study's second objective was calculating the random forest obtained from the public database MIMIC III. The study population was 4559 patients, of which 741 patients died from sepsis. There were no significant differences between the sexes. Table 2 shows the descriptive statistics from the MIMIC III database. The variables (mean): Age, anion gap, bicarbonate, creatinine, glucose, lactate, potassium,inr, sodium, bun,wbc, heart rate, sysbp, diasbp, bp, respiration rate,tempc, spo2, and urine output were statistically significant differences A Random Forest based on exitus (hospital_expire_flag) is carried out in the SEPSIS_GROUP database that shows the variables of importance (Fig. 3). The sensitivity of the classification model was 97.55%, the specificity was 100%, and the accuracy was 98.28% with an AUC of 97.3% (0.968−0.981) (Fig. 4).

Table 2.

Descriptive Statistics MIMIC III database.

Variables    mean  sd  p25%  p50%  p75%  p-value 
age  live  64,13  17,80  52,83  65,50  78,42  < 0,001 
  dead  70,27  16,05  59,86  73,02  83,18   
aniongap_max  live  16,13  4,64  13,00  15,00  18,00  < 0,001 
  dead  19,43  6,41  15,00  18,00  22,00   
aniongap_min  live  12,44  3,08  10,00  12,00  14,00  < 0,001 
  dead  14,80  4,84  12,00  14,00  17,00   
bicarbonate_max  live  24,84  4,39  22,00  25,00  27,00  < 0,001 
  dead  22,98  5,55  20,00  23,00  26,00   
bicarbonate_min  live  21,71  4,77  19,00  22,00  24,00  < 0,001 
  dead  19,03  6,10  15,00  19,00  23,00   
bun_max  live  30,32  24,70  15,00  22,00  36,00  < 0,001 
  dead  42,17  27,92  22,00  34,00  54,00   
bun_mean  live  27,35  22,03  14,00  20,00  33,00  < 0,001 
  dead  38,94  26,52  19,50  31,50  50,50   
bun_min  live  24,39  20,02  12,00  18,00  30,00  < 0,001 
  dead  35,81  25,40  18,00  28,00  48,25   
chloride_max  live  107,95  6,47  104,00  108,00  112,00  0,844 
  dead  108,01  7,74  103,00  108,00  113,00   
chloride_min  live  101,95  6,73  99,00  102,00  106,00  0,041 
  dead  101,35  7,45  97,00  102,00  106,00   
creatinine_max  live  1,67  1,77  0,80  1,10  1,70  < 0,001 
  dead  2,04  1,44  1,00  1,60  2,60   
creatinine_min  live  1,30  1,29  0,70  0,90  1,30  < 0,001 
  dead  1,64  1,24  0,80  1,30  2,10   
diasbp_mean  live  61,45  10,07  54,70  60,69  67,37  < 0,001 
  dead  58,96  11,02  51,45  57,92  65,24   
glucose_mean  live  181,11  2317,67  112,72  133,00  161,46  0,522 
  dead  156,99  69,20  114,50  140,90  182,32   
heartrate_mean  live  87,76  16,11  75,90  87,26  98,64  < 0,001 
  dead  91,92  18,74  76,97  91,68  105,59   
hematocrit_max  live  35,95  6,11  31,70  35,60  40,00  0,794 
  dead  35,88  6,78  31,20  35,10  40,10   
hematocrit_min  live  29,68  6,19  25,30  29,40  33,90  0,536 
  dead  29,84  6,74  25,00  29,60  34,30   
hemoglobin_max  live  11,97  2,10  10,40  11,90  13,40  0,087 
  dead  11,81  2,30  10,20  11,60  13,20   
hemoglobin_min  live  10,02  2,10  8,50  9,90  11,45  0,111 
  dead  9,88  2,25  8,30  9,70  11,40   
inr_max  live  1,63  1,35  1,20  1,30  1,60  < 0,001 
  dead  2,14  1,82  1,20  1,60  2,30   
inr_min  live  1,35  0,61  1,10  1,20  1,40  < 0,001 
  dead  1,60  0,92  1,10  1,30  1,80   
lactate_mean  live  2,13  1,27  1,30  1,80  2,55  < 0,001 
  dead  3,60  2,92  1,70  2,55  4,50   
meanbp_mean  live  76,71  10,20  69,63  75,51  82,95  < 0,001 
  dead  73,39  11,56  66,21  71,98  79,55   
platelet_max  live  245,61  128,52  163,00  225,00  300,00  0,975 
  dead  245,78  149,35  139,75  215,00  325,00   
platelet_min  live  196,78  109,80  126,00  180,00  245,00  0,315 
  dead  191,59  131,90  95,00  166,00  252,50   
potassium_max  live  4,69  0,95  4,10  4,50  5,10  < 0,001 
  dead  4,94  1,04  4,20  4,70  5,50   
potassium_min  live  3,71  0,55  3,40  3,70  4,00  < 0,001 
  dead  3,88  0,72  3,40  3,80  4,30   
resprate_mean  live  19,53  4,08  16,57  18,88  21,85  < 0,001 
  dead  21,91  4,68  18,45  21,50  24,86   
sodium_max  live  140,46  5,17  138,00  140,00  143,00  0,259 
  dead  140,75  6,61  137,00  141,00  144,00   
sodium_min  live  136,05  5,39  133,00  136,00  139,00  0,195 
  dead  135,71  6,68  132,00  136,00  140,00   
spo2_mean  live  97,11  1,96  95,92  97,33  98,62  < 0,001 
  dead  95,86  4,09  94,56  96,76  98,47   
sysbp_mean  live  117,30  15,60  106,06  114,58  126,38  < 0,001 
  dead  110,63  16,76  99,94  107,61  119,80   
tempc_mean  live  36,90  0,66  36,47  36,87  37,32  < 0,001 
  dead  36,57  1,02  36,11  36,62  37,19   
urineoutput  live  1968,71  1540,91  1025,50  1680,50  2587,25  < 0,001 
  dead  1197,32  1345,83  351,00  855,00  1575,00   

*p < 0.05, Age: years, SpO2:%, SBP-DBP: mmHg, Lactate: mmol/l, Potassium: mEq/l. All variables correspond to averages during ICU stay.

Figure 3.

Random Forest from the MIMIC III database (Sepsis Group).

Figure 4.

ROC curve corresponding to the Random Forest of the MIMIC III database.


The most significant variables are the average urinary output, lactate mean, anion gap, spO2, age, potassium mean, respiration rate, and systolic blood pressure mean.

The standard variables in both databases that have shown the most significant importance have been the average lactate and the average urine production, adding the average oxygen saturation, the average temperature, and, in our case, the pH measurement that could correspond. In the case of the MIMIC III, with the bicarbonate mean and Annion gap. Other fundamental constants in the MIMIC III, such as the average respiratory rate and potassium mean that the hospital's database appears, but not with relevance. Systolic pressure is quite important in both databases.

Finally, we proposed to evaluate the model calculated from the local database as an external validation of the MIMIC III database. For this purpose, we use the common variables between the local and MIMIC III databases. We included the common 12 variables: Age, exitus, uci_days, Heartrate_mean, Resprate_mean, Sysbp_mean, Diasbp_mean, Spo2_mean, Tempc_mean, Urine_mean, Lactate_mean and Potassium_mean. The accuracy was 0,78 (0,49–0,95) from the local database and an AUC of 0,70 (0,40–1). The explanation for this low accuracy was that the variable ph mean was essential in the model built with our local database. From the MIMIC database, the accuracy was only 0,56 (0,55–0,58) and an AUC of 0,58 (0,56–0,60). The accuracy in this case could have been lower due to the reduction of the number of essential variables from 31 to 12 variables.

We concluded that the external validation of our model was not possible due to the different number of variables and the absence in the MIMIC III database of a critical variable such as ph mean


Since the 2000s, the publication of articles related to machine learning has increased significantly, becoming one of the study trends at the health research level. Open access to extensive patient databases allows machine learning techniques to be applied to diagnosing and prognosis different diseases. In this work, we have used the MIMIC III public access database of ICU patients with sepsis to contrast the model calculated with machine learning techniques with the one generated with our database.

Many published articles use the MIMIC III database to evaluate risk variables in patients with sepsis. Vital signs, age, demographic data, and the MIMIC III database helped develop the Insight algorithm, which performed well compared to the various methods used until then.12

Calvert et al. carried out the study with the MIMIC II database, using both vital signs and laboratory values ​​(Procalcitonin and lactic acid); using Insight, they obtained better results than other models.13 Other authors, such as Kam and Kim, extracted vital signs from MIMIC II to perform a deep-learning study. They compared deep learning networks with LSTM (long short-term memory) architectures. They obtained better LSTM results than other sepsis prediction methods.14 Moor et al. performed a sepsis prediction study mixing machine learning techniques with DTW (Dynamic et al.). They used MIMIC III as a database to extract vital signs and laboratory values. They obtained good results compared to other works.15

Nemati et al. carried out a test with two different databases; they used both Emory Healthcare and MIMIC III, the former to develop the algorithm and the latter as a control. They took as values ​​the vital signs, laboratory results, demographic data, surgical history, and statistical data.16

Mao et al. relied on the Insight algorithm and used vital signs from several hospitals to try to predict sepsis, severe sepsis, and septic shock. They used a Gradient tree-boosting algorithm. The database used to train the algorithm was MIMIC III, using transfer learning, with which they obtained better results than other databases.5

Scherph et al. calculated a neural network architecture using the MIMIC III database, from which they obtained vital signs and white blood cell counts. They compared the results with the Insight algorithm. They obtained better results in terms of AUC.2

Van Steenkiste et al. developed a blood culture results prediction study with an LSTM structure. Blood cultures are essential in choosing sepsis treatment. They took laboratory and vital sign data in the hospital emergency environment to carry it out. They concluded that the prediction period of the previous 72 h is ideal and that they obtained better results with deep learning than with other methods.17 Schaomoni et al. experimented with linear and non-linear models, their significant contribution being a new and verified database. They used several parameters, such as respiratory rate and C-reactive protein (CRP). Furthermore, they investigated the individual evolution of each prediction.18

Ribas Ripoll et al. analyzed the mortality prediction in ICU patients with sepsis by developing a new method called the Quotient Basis Kernel (QBK). They simplified the Fisher kernel and used variables such as days of stay or vasoactive agents from the MIMIC II database.19

In our study carried out with the databases of several hospitals and the public database MIMIC III, the essential variables standard to the local databases were mean lactate, mean pH, urine output, and partial O2 saturation in addition to age, and the minimum SBP, achieving an accuracy of 95%–98% of the predictions of death due to sepsis. The variables respiratory rate, amount of urine, and minimum potassium were also added to the classification model calculated on the MIMIC III database. The importance of potassium in the MIMIC III database is relevant, while in our hospital, it is identified in the random forest but with less importance.

Of all the variables in both databases, lactate has particular relevance. According to recent publications, high circulating levels of lactate are associated with the severity and mortality of sepsis. Lactate could promote the release of HMGB1 during sepsis. Some studies have determined that lactate participates in the lactylation and acetylation of HMGB1 in macrophages during polymicrobial sepsis. Macrophages can uptake extracellular lactate via monocarboxylate transporters (MCTs) to promote HMGB1 lactylation through a p300/CBP-dependent mechanism. Lactate has been shown to stimulate HMGB1 acetylation through suppression of the Hippo/YAP-mediated SIRT1 deacetylase and β-arrestin2-mediated recruitment of p300/CBP acetylases to the nucleus via G protein-coupled receptor 81 (GPR81). Lactylated/acetylated HMGB1 is released from macrophages through exosome secretion, increasing endothelial permeability. In vivo, reducing lactate production and/or inhibition of GPR81-mediated signaling decreases circulating levels of exosomal HMGB1 and improves survival outcomes in polymicrobial sepsis.20

The main limitation of this work was that the variables of the local database, obtained from local hospital records, and the MIMIC III database have a very different number of variables (31 MIMIC III and 15 local databases), and some variables are not common to both databases. In our study, the variable pH mean was the most critical variable in the random forest and is not included in the MIMIC III database. We could equipare pH with the bicarbonate and anion gap values of the MIMC II database. These variables are important in the random forest calculated with this database. As these variables are not comparable to pH, it is impossible to validate our model with this database. In any case, the study's main conclusion is the importance of the variables urine output, satO2, Sisbp mean, lactate levels, and variables related to acid-base equilibrium, such as pH and anion gap. Another conclusion is that despite the difference in the number of variables, the machine learning models calculated with the local database and the MIMIC III database have similar accuracies, sensitivities, and specificities.


Many authors have developed predictive models for death in sepsis patients admitted to intensive care units. Artificial intelligence techniques and access to standardized and public databases are essential to developing these models. In both databases, lactate levels, average urine production, and variables related to acid-base equilibrium are critical variables in the prognosis of sepsis.

Authors’ contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Javier Carrillo Pérez-Tomé, Gracia Castro-Luna, Ana Belén Castaño-Fernandez, Bruno José Nievas-Soriano and Tesifón Parrón-Carreño.

The first draft of the manuscript was written by Gracia Castro-Luna and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Statements and declarations

  • The authors have no relevant financial or non-financial interests to disclose.

  • The authors have no conflicts of interest to declare relevant to this article's content.

  • All authors certify that they have no affiliations with or involvement in any organization or entity with any financial or non-financial interest in the subject matter or materials discussed in this manuscript.

  • The authors have no financial or proprietary interests in any material discussed in this article.

R. Neira-Sanchez Elsa, Málaga Germán.
Sepsis-3 y las nuevas definiciones, ¿es tiempo de abandonar SIRS?.
M. Scherpf, F. Gräßer, H. Malberg, S. Zaunseder.
Predicting sepsis with a recurrent neural network using the MIMIC III database.
C. Barton, U. Chettipally, Y. Zhou, Z. Jiang, A. Lynn-Palevsky, S. Le, et al.
Evaluation of a machine learning algorithm for up to 48-hour advance prediction of sepsis using six vital signs.
Comput Biol Med., 109 (2019), pp. 79-84
N. Ocampo-Quintero, P. Vidal-Cortés, L. Del Río Carbajo, F. Fdez-Riverola, M. Reboiro-Jato, D. Glez-Peña.
Enhancing sepsis management through machine learning techniques: a review.
Med Intensiva (Engl Ed)., 46 (2022), pp. 140-156
Q. Mao, M. Jay, J.L. Hoffman, J. Calvert, C. Barton, D. Shimabukuro, et al.
Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU.
M. Schinkel, K. Paranjape, R.S. Nannan Panday, N. Skyttberg, P.W.B. Nanayakkara.
Clinical applications of artificial intelligence in sepsis: a narrative review.
A. Núñez Reiz, M.A. Armengol de la Hoz, M. Sánchez García.
Big data analysis and machine learning in intensive care units.
Med Intensiva (Engl Ed)., 43 (2019), pp. 416-426
Z.M. Ibrahim, H. Wu, A. Hamoud, L. Stappen, R.J.B. Dobson, A. Agarossi.
On classifying sepsis heterogeneity in the ICU: insight using machine learning.
J Am Med Inform Assoc., 27 (2020), pp. 437-443
J.C. Ginestra, H.M. Giannini, W.D. Schweickert, L. Meadows, M.J. Lynch, K. Pavan, et al.
Clinician perception of a machine learning-based early warning system designed to predict severe sepsis and septic shock.
Crit Care Med., 47 (2019), pp. 1477-1484
A. Vellido, V. Ribas, C. Morales, A. Ruiz Sanmartín, J.C. Ruiz Rodríguez.
Machine learning in critical care: state-of-the-art and a sepsis case study.
Biomed Eng Online., 17 (2018), pp. 135
M. Komorowski.
Clinical management of sepsis can be improved by artificial intelligence: yes.
Intensive Care Med., 46 (2020), pp. 375-377
T. Desautels, J. Calvert, J. Hoffman, M. Jay, Y. Kerem, L. Shieh, et al.
Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach.
JMIR Med Inform., 4 (2016), pp. e28
J.S. Calvert, D.A. Price, U.K. Chettipally, C.W. Barton, M.D. Feldman, J.L. Hoffman, et al.
A computational approach to early sepsis detection.
Comput Biol Med., 74 (2016), pp. 69-73
H.J. Kam, H.Y. Kim.
Learning representations for the early detection of sepsis with deep neural networks.
Comput Biol Med., 89 (2017), pp. 248-255
M. Moor, M. Horn, B. Rieck, D. Roqueiro, K. Borgwardt.
Early recognition of sepsis with gaussian process temporal convolutional networks and dynamic time warping.
Proceedings of the 4th Machine Learning for Healthcare Conference,
S. Nemati, A. Holder, F. Razmi, M.D. Stanley, G.D. Clifford, T.G. Buchman.
An interpretable machine learning model for accurate prediction of sepsis in the ICU.
Crit Care Med., 46 (2018), pp. 547-553
T. Van Steenkiste, J. Ruyssinck, L. De Baets, J. Decruyenaere, F. De Turck, F. Ongenae, et al.
Accurate prediction of blood culture outcome in the intensive care unit using long short-term memory neural networks.
Artif Intell Med., 97 (2019), pp. 38-43
S. Schamoni, H.A. Lindner, V. Schneider-Lindner, M. Thiel, S. Riezler.
Leveraging implicit expert knowledge for non-circular machine learning in sepsis prediction.
Artif Intell Med., 100 (2019),
V.J. Ribas Ripoll, A. Vellido, E. Romero, J.C. Ruiz-Rodríguez.
Sepsis mortality prediction with the Quotient Basis Kernel.
Artif Intell Med., 61 (2014), pp. 45-52
K. Yang, M. Fan, X. Wang, J. Xu, Y. Wang, F. Tu, et al.
Lactate promotes macrophage HMGB1 lactylation, acetylation, and exosomal release in polymicrobial sepsis.
Cell Death Differ., 29 (2022), pp. 133-146

Please cite this article as: Pérez-Tome JC, Parrón-Carreño T, Castaño-Fernández AB, Nievas-Soriano BJ, Castro-Luna G. Predicción de la mortalidad por sepsis con técnicas de aprendizaje automático. Med Intensiva. 2024.

Copyright © 2024. The Author(s)
Medicina Intensiva (English Edition)
Article options
es en

¿Es usted profesional sanitario apto para prescribir o dispensar medicamentos?

Are you a health professional able to prescribe or dispense drugs?