Journal Information
Vol. 48. Issue 1.
Pages 3-13 (January 2024)
Download PDF
More article options
Vol. 48. Issue 1.
Pages 3-13 (January 2024)
Original article
Full text access
Predictors of mechanical ventilation and mortality in critically ill patients with COVID-19 pneumonia
Predictores de ventilación mecánica y mortalidad en pacientes críticos con neumonía por COVID-19
Sergio Muñoz Lezcanoa,1,
Corresponding author

Corresponding author.
, Miguel Ángel Armengol de la Hozb,1, Alberto Corbic, Fernando Lópezd, Miguel Sánchez Garcíae, Antonio Nuñez Reizf, Tomás Fariña Gonzálezg, Viktor Yordanov Zlatkova
a PhD Student of the Program in Computer Science, Universidad Internacional de La Rioja (UNIR), Avenida de La Paz, 137, 26006 Logroño, La Rioja, Spain
b Big Data Department, PMC-FPS, Consejería de Salud y Consumo, Junta de Andalucía, Spain
c Research Institute for Innovation & Technology in Education (iTED), Universidad Internacional de La Rioja (UNIR), Avenida de La Paz, 137, 26006 Logroño, La Rioja, Spain
d Mathematical Analysis and Applied Mathematics Department, Faculty of Mathematics. Universidad Complutense de Madrid, Spain
e Critical Care Department, Hospital Clínico San Carlos, Martín Lagos s/n, 28040 Madrid, Spain
f Critical Care Department, Hospital Universitario Clínico San Carlos, Martín Lagos s/n, 28040 Madrid, Spain
g Critical Care Department, Hospital Universitario Infanta Sofía, Spain
This item has received
Article information
Full Text
Download PDF
Figures (3)
Show moreShow less
Tables (3)
Table 1. Group of predictors for Invasive Mechanical Ventilation regression purposes.
Table 2. IMV Results. Group of predictors used for mortality prediction with GLMM tree algorithm.
Table 3. IMV Results.
Show moreShow less
Additional material (1)

To determine if potential predictors for invasive mechanical ventilation (IMV) are also determinants for mortality in COVID-19-associated acute respiratory distress syndrome (C-ARDS).


Single center highly detailed longitudinal observational study.


Tertiary hospital ICU: two first COVID-19 pandemic waves, Madrid, Spain.

Patients or participants

: 280 patients with C-ARDS, not requiring IMV on admission.



Main variables of interest

: Target: endotracheal intubation and IMV, mortality.

Predictors: demographics, hourly evolution of oxygenation, clinical data, and laboratory results.


The time between symptom onset and ICU admission, the APACHE II score, the ROX index, and procalcitonin levels in blood were potential predictors related to both IMV and mortality. The ROX index was the most significant predictor associated with IMV, while APACHE II, LDH, and DaysSympICU were the most with mortality.


According to the results of the analysis, there are significant predictors linked with IMV and mortality in C-ARDS patients, including the time between symptom onset and ICU admission, the severity of the COVID-19 waves, and several clinical and laboratory measures. These findings may help clinicians to better identify patients at risk for IMV and mortality and improve their management.

Acute respiratory distress syndrome
Invasive mechanical ventilation
Machine learning
Artificial intelligence

Determinar si las variables clínicas independientes que condicionan el inicio de ventilación mecánica invasiva (VMI) son los mismos que condicionan la mortalidad en el síndrome de distrés respiratorio agudo asociado con COVID-19 (C-SDRA).


Estudio observacional longitudinal en un solo centro.


UCI, hospital terciario: primeras dos olas de COVID-19 en Madrid, España.

Pacientes o participantes

280 pacientes con C-SDRA que no requieren VMI al ingreso en UCI.



Principales variables de interés

Objetivo: VMI y Mortalidad.

Predictores: demográficos, variables clínicas, resultados de laboratorio y evolución de la oxigenación.


El tiempo entre el inicio de los síntomas y el ingreso en la UCI, la puntuación APACHE II, el índice ROX y los niveles de procalcitonina en sangre eran posibles predictores relacionados tanto con la IMV como con la mortalidad. El índice ROX fue el predictor más significativo asociada con la IMV, mientras que APACHE II, LDH y DaysSympICU fueron los más influyentes en la mortalidad.


Según los resultados obtenidos se identifican predictores significativos vinculados con la VMI y mortalidad en pacientes con C-ARDS, incluido el tiempo entre el inicio de los síntomas y el ingreso en la UCI, la gravedad de las olas de COVID-19 y varias medidas clínicas y de laboratorio. Estos hallazgos pueden ayudar a los médicos a identificar mejor a los pacientes en riesgo de IMV y mortalidad y mejorar su manejo.

Palabras clave:
Síndrome de distrés respiratorio agudo
Ventilación mecánica invasiva
Aprendizaje automático
Inteligencia artificilal
Full Text

Invasive mechanical ventilation (IMV) is a cornerstone of organ support in severe COVID-19 patients with acute respiratory distress syndrome (ARDS). As widely experienced in ICUs during the SARS-CoV-2 pandemic, IMV frequently causes complications.1,2 Hospital services were overwhelmed not only by the surge of patients, but also by scarce human resources and equipment, lack of sufficient mechanical ventilators being probably the most relevant. In surge scenarios, appropriate triage strategies are therefore needed to allocate IMV or alternatives such as high flow nasal prongs. These strategies should be based on the knowledge and understanding of specific potential predictors3 that could help clinicians to personalize decisions regarding IMV.

There is still considerable controversy regarding who and when to intubate. Several recent studies have addressed the subject,4 although bias cannot be excluded in observational non-randomized trials. A retrospective study suggested that early intubation and IMV is associated with favorable outcomes but included only intubated patients instead of the whole population at risk.

Previous studies have identified covid-19 progression predictors including age, comorbidities, renal function, or immunodeficiency5 using traditional statistical approaches, where collinearity of data cannot be ruled out. Artificial intelligence (AI) is currently being used for COVID-19 risk stratification,6 studying multiple clinical features to increase effectiveness and efficiency in diagnosis, treatment, and prognosis. Self-explainable Machine learning (ML) techniques can help with risk factor selection through ranking methodologies.7 In this context, the utilization of artificial intelligence (AI) holds potential in facilitating the development of a conceptual model aimed at comparing the significance of variables. This can be achieved by employing regularization models8 to enhance predictor selection, followed by the implementation of the Generalized Linear Mixed-effects Model (GLMM)9–11 to construct the said conceptual model. Such an approach becomes particularly relevant when assessing and comparing outcomes across different AI models, enabling a comprehensive evaluation of variable significance. This is a novel methodology, leveraging modern machine learning techniques to provide rigorous and applicable insight into relevant clinical questions when randomized clinical trials are not feasible. From here on, in this paper, we aim to determine if potential predictors for invasive mechanical ventilation (IMV) are also determinants for mortality in COVID-19-associated acute respiratory distress syndrome (C-ARDS) while comparing the significance of variables in both cases.

Patients and methodsSelection and description of patients

In our retrospective observational study, we have collected and curated data from our electronic medical records (EMR) from March 3rd of 2020 through February 28th of 2021. We selected patients admitted to our ICU at San Carlos Hospital (HCSC) in Madrid (Fig. 1) but were initially not mechanically ventilated. The selection of patients considered just COVID-19 pneumonia patients, incidental COVID-19 was excluded. The age range for inclusion was restricted to individuals aged 18 years or older.

Figure 1.

COVID-19 patients admitted during first and second pandemic waves. The cohort comprises 280 severe COVID-19 patients admitted to the ICU Department at HCSC in Madrid, Spain, between March 3, 2020, and February 28, 2021. During this time period, SARS-COV-2 wild-type and subsequently alpha variants were prevalent in Spain. Over the study time period 4229 covid-19 patients were admitted to HCSC, 405 of whom required ICU admission (first wave: 153, second wave: 252 patients).


The database comprises hourly data points for each patient during the first five days. Afterwards, we utilized multi-stage machine learning algorithms to assess the most significant variables in predicting invasive mechanical ventilation (IMV) and ICU mortality (Fig. 2). It is worth noting that 28-day mortality, while frequently used in large studies like RECOVERY, may not be a suitable outcome measure in COVID-19 patients due to the possibility of delayed mortality.

Figure 2.

Methodology for fitting the machine learning algorithms. In a previous stage, Figure 5 in Supplementary material shows the complete workflow, from the cohort selection according to clinical needs to the implementation of the algorithms that have been included in the explanation. The first step involves the cohort selection as well as the initial group of variables considered in this study, The second step consists in the implementation of a statistical study of each variable. This step also involves correlation (Figure 6 in Supplementary material) imputation and transformations procedures in order to dispose of the most accurate data in the following steps. The third step analyzed the most significant predictors based on five Machine Learning (ML) techniques linked with regression analysis based on 10-fold cross-validation regressions. The fourth and last step identifies the behavior of each predictor attending to different proposes. The first one is related to mechanical ventilation needs attending to different settings in the Generalized Linear Mixed Model (GLMM) Tree (depth of layers) looking for the best balance between performance (Akaike Information Criterion (AIC), Bayesian information criterion (BIC), Area Under the Roc Curve (ROC) and more parameters within the table III) and explainability of the model. The second one is related to the most representative mortality predictors but following the same balance objective.


All data were registered in our electronic medical record (ICCA Philips). A total of 12,163 longitudinal sets of hourly clinical and lab data were gathered. Longitudinal sets are grouped in clustered events associated with patients. Each entry contains demographics data, first or second wave admission, time elapsed from start of symptoms to O2 therapy and ICU admission, APACHE II score, monitoring, blood gases and therapy-related data. We discarded variables with more than 33% of missing values for consistency. We used mode imputation or mean imputation to complete missing values of the remaining variables. Tables 1 and 2 show the predictors that were finally used for the purposes of the study.

Table 1.

Group of predictors for Invasive Mechanical Ventilation regression purposes.

Dataset clinical and biochemical characteristics
Invasive Mechanical Ventilation (IMV)
Variable  N  Overall, N=12,163a  Invasive Mechanical Ventilation (IMV)p-valueb 
      No, N=9093a  Yes, N=3070a   
Age, years, Median (Q1-Q3)  12,163  59 (51–68)  58 (50–67)  63 (54–70)  <0.001 
Gender, n (%)  12,163        <0.001 
Male    8032 (66)  5649 (62)  2383 (78)   
Female    4131 (34)  3444 (38)  687 (22)   
Ethnicity, n (%)  12,163        <0.001 
Amerindian    4671 (38)  3625 (40)  1046 (34)   
Arab    545 (4.5)  468 (5.1)  77 (2.5)   
Spanish    6427 (53)  4591 (50)  1836 (60)   
Others    520 (4.3)  409 (4.5)  111 (3.6)   
Wave, n (%)  12,163        <0.001 
First    1490 (12)  766 (8.4)  724 (24)   
Second    10,673 (88)  8327 (92)  2346 (76)   
Body mass index, Median (Q1–Q3)  12,163  27.8 (26.0–31.1)  27.8 (26.0–31.2)  27.7 (26.0–29.4)  0.70 
Heart rate, median bpm (IQR)  12,163  73 (65–84)  73 (64–83)  76 (67–87)  <0.001 
Temperature in ºC, Median (Q1–Q3)  12,163  36.80 (36.50–37.10)  36.73 (36.44–37.02)  36.97 (36.64–37.37)  <0.001 
Arterial pressure in mmHg, Median (Q1–Q3)  12,163  87 (79–95)  87 (80–95)  87 (78–95)  <0.001 
Lactate in mEq/l, Median (Q1–Q3)  12,163  1.42 (1.20–1.70)  1.42 (1.14–1.65)  1.50 (1.33–1.80)  <0.001 
Procalcitonin, ng/mL Median (Q1–Q3)  12,163  0.13 (0.08 – 0.23)  0.13 (0.07 – 0.20)  0.14 (0.13 – 0.35)  <0.001 
Eosinophile count per cubic mm, Median (Q1–Q3)  12,163  4 (0–20)  4 (0–22)  4 (0–13)  0.011 
C reactive protein, mg/L Median (Q1–Q3)  12,163  8 (6–11)  8 (4–10)  8 (8–15)  <0.001 
Alkaline phosphatase U/L, Median (Q1–Q3)  12,163  82 (68–101)  82 (65–99)  82 (76–104)  0.006 
Total bilirubin mg/dL, Median (Q1–Q3)  12,163  0.53 (0.44 – 0.62)  0.53 (0.42 – 0.59)  0.53 (0.51 – 0.71)  <0.001 
Oxygenation index (ROX Index), Median (Q1–Q3)  12,163  5.93 (4.52–7.92)  6.18 (5.22–8.67)  4.46 (3.62–5.93)  <0.001 
Creatinine, mg/dL Median (Q1–Q3)  12,163  0.67 (0.59–0.78)  0.67 (0.58–0.77)  0.67 (0.65–0.82)  <0.001 
Leukocyte count per mm3, Median (Q1–Q3)  12,163  8925 (7160–10,804)  8925 (6857–10,548)  8925 (8400–11,478)  <0.001 
Hemoglobin g/l, Median (Q1–Q3)  12,163  13.16 (12.28–13.96)  13.16 (12.20–13.93)  13.16 (12.63–14.03)  <0.001 
Amylase U/L, Median (Q1–Q3)  12,163  63 (50–79)  63 (51–84)  63 (48–64)  <0.001 
Lactate dehydrogenase, Median (Q1–Q3)  12,163  882 (749 – 1038)  882 (682–964)  939 (882–1172)  <0.001 
Lymphocyte count per mm3, Median (Q1–Q3)  12,163  829 (638–1049)  829 (657–1148)  829 (570–871)  <0.001 
AST (Aspartate Aminotransferase) U/L, Median (Q1–Q3)  12,163  45 (34–60)  45 (34–63)  45 (33–54)  <0.001 
Hours from ICU admission to this register, Median (Q1–Q3)  12,163  31 (14–50)  33 (16–51)  23 (9–45)  <0.001 
APACHE (Acute Physiology and Chronic Health Evaluation) II, Median (Q1–Q3)  12,163  13.0 (10.0–17.0)  12.0 (10.0–16.0)  15.0 (13.0–17.0)  <0.001 
Days from first symptoms to O2 therapy, Median (Q1–Q3)  12,163  7.00 (6.00–8.00)  7.00 (6.00–8.00)  7.00 (6.00–8.00)  0.008 
Days from first symptoms to ICU admission, Median (Q1–Q3)  12,163  9.0 (8.0–11.0)  9.0 (8.0–11.0)  9.0 (7.0–13.0)  <0.001 
Arterial pH, Median (Q1–Q3)  12,163  7.43 (7.41–7.45)  7.43 (7.41–7.46)  7.43 (7.39–7.44)  <0.001 
Arterial pCO2, Median (Q1–Q3)  12,163  38.1 (35.7–41.0)  38.1 (35.6–40.7)  38.4 (36.0–42.4)  <0.001 
Type of blood sample, n (%)  12,163        <0.001 
Arterial    696 (5.7)  496 (5.5)  200 (6.5)   
BLDO (Capillary blood gas analysis)    5 (<0.1)  5 (<0.1)  0 (0)   
Arterial    91 (0.7)  27 (0.3)  64 (2.1)   
Mixed    28 (0.2)  28 (0.3)  0 (0)   
Venous    1405 (12)  931 (10)  474 (15)   
Venous    9845 (81)  7529 (83)  2316 (75)   
Mixed venous    93 (0.8)  77 (0.8)  16 (0.5)   
Blood gas sat. O2, Median (Q1–Q3)  12,163  85 (75–91)  85 (77–91)  84 (72–89)  <0.001 
Corticosteroid dose, first 5 days of admission (mg of equivalent methylprednisolone dose), Median (Q1–Q3)  12,163  36 (30–60)  36 (30–60)  36 (30–78)  0.32 
Melatonin dose in mg/day, n (%)  12,163        0.001 
  4545 (37)  3445 (38)  1100 (36)   
50    3886 (32)  2922 (32)  964 (31)   
100    2167 (18)  1617 (18)  550 (18)   
200    1565 (13)  1109 (12)  456 (15)   
D dimer, ng/mL Median (Q1–Q3)  12,163  1031 (862–1263)  1031 (804–1232)  1031 (1031–1416)  <0.001 

This group of predictors will be applied in the selection procedure linked with the five regression algorithms: Ridge, LASSO, Elastic, Boruta and R-part Based on the reached results, the group of predictors are going to be reduced attending to its behavior related to IMV needs. Figures 7–11 (Supplementary material) shows the results from each regression procedure where R-Part was finally selected due to its good balance between model performance and explicability of results.

Data updated June 22, 2023.


Median (Q1–Q3) or Frequency (%).


Welch Two Sample t-test; Pearson's Chi-squared test.

Table 2.

IMV Results. Group of predictors used for mortality prediction with GLMM tree algorithm.

Dataset variables statistical characteristics
ICU mortality
Variable  N  Overall, N=12,163a  Mortalityp-valueb 
      Alive, N=9777a  Died, N=2386a   
Days elapsed from first symptoms to ICU admission, Median (Q1–Q3)  12,163  9.0 (8.0–11.0)  9.0 (8.0–11.0)  9.0 (7.0–13.0)  <0.001 
APACHE (Acute Physiology and Chronic Health Evaluation) II, Median (Q1–Q3)  12,163  13.0 (10.0–17.0)  12.0 (10.0–16.0)  15.0 (13.0–17.0)  <0.001 
Corticosteroids administered during the first 5d of admission as mg of equivalent methylprednisolone dose, Median (Q1–Q3)  12,163  36 (30–60)  36 (30–60)  36 (30–80)  <0.001 
Oxygenation index, Median (Q1–Q3)  12,163  5.93 (4.52–7.92)  5.93 (4.95–8.42)  4.58 (3.63–5.93)  <0.001 
Serum Lactate dehydrogenase, U/L Median (Q1–Q3)  12,163  882 (749 – 1038)  882 (695–964)  1026 (882–1279)  <0.001 
Body mass index, Median (Q1–Q3)  12,163  27.8 (26.0–31.1)  27.8 (26.0–31.8)  27.8 (26.0–29.4)  <0.001 
Temperature in ºC, Median (Q1–Q3)  12,163  36.80 (36.50–37.10)  36.78 (36.50–37.10)  36.86 (36.50–37.20)  <0.001 
Days elapsed from first symptoms to O2 therapy, Median (Q1–Q3)  12,163  7.00 (6.00–8.00)  7.00 (6.00–8.00)  7.00 (6.00–7.00)  0.073 
Total bilirubin mg/dL, Median (Q1–Q3)  12,163  0.53 (0.44–0.62)  0.53 (0.42–0.60)  0.53 (0.51–0.68)  <0.001 
Wave, n (%)  12,163        <0.001 
First    1490 (12)  1031 (11)  459 (19)   
Second    10,673 (88)  8746 (89)  1927 (81)   
Lymphocyte count per mm3, Median (Q1–Q3)  12,163  829 (638–1049)  829 (667–1120)  800 (499–886)  <0.001 
Arterial pH, Median (Q1–Q3)  12,163  7.43 (7.41–7.45)  7.43 (7.41–7.46)  7.43 (7.38–7.45)  <0.001 
C reactive protein levels mg/L, Median (Q1–Q3)  12,163  8 (6–11)  8 (5–10)  8 (8–14)  <0.001 
Hours from ICU admission to this register, Median (Q1–Q3)  12,163  31 (14–50)  31 (14–51)  28 (11–47)  <0.001 

Data updated June 22, 2023.


Median (Q1–Q3) or Frequency (%).


Welch Two Sample t-test; Pearson's Chi-squared test.

Data were anonymized, excluding demographic or temporal information. The study protocol was approved by the local ethics committee (approval code 22/007-E), who waived the need for informed consent due to the retrospective non-interventional nature of the study.

Methods and techniques

Data collected as described above were used to fit the model12 following four steps for the whole process, as shown in Fig. 2. Considering that our data involve a concatenation of longitudinal data for each patient in different events, it was necessary to identify correlations within the cluster when trying to build an accurate prediction model.10

The different regression approaches to select potential predictors for IMV and ICU mortality risk tested were: LASSO,13 Ridge,14 Elastic-net,15 Boruta16 and R-Part.17 LASSO, Ridge and Elastic-net perform an automatic predictor selection supported by L1 and L2 regularization terms18 that minimizes the risk of overfitting, reducing variance and reaching an attenuation effect over the correlation between features. Boruta19 is a feature selection model based on a Random Forest algorithm that selects all the risk predictors that are relevant for classification purposes defined as all-relevant problems. R-Part17 builds a classification model based on binary trees. R-Part varImp function20 identifies the effect of model predictors based on the loss function mean squared error. In any case, potential predictors have been analyzed and confirmed or rejected based on clinical criteria.

After identifying the optimal set of potential predictors (Figure 10–14 in Supplementary material), clustering effects by patient and temporal distribution, as well as cutoff points of the significant variables and their interactions were assessed with GLMM Trees.9–11 To build these trees, we took the entire dataset into account, grouping data by patient and data charting time as random variables to fit the model.12 This fitting methodology avoids both over and underfitting effects that could impact the model’s performance.21 Models were implemented based on a 10-fold cross validation strategy using a four-depth-of-layers (full, 5, 10 and 20) strategy. This means the fitting procedure was executed ten times per algorithm implementation. It’s necessary to remark that the positive class for the invasive mechanical ventilation (IMV) variable refers to cases where IMV is required, while the positive class for the ICU mortality variable is related to cases where patients die. It is worth mentioning that the focus of the study is on identifying independent variables and their associated thresholds with IMV and ICU mortality, without defining specific categories to predict.

We used a GLMM Tree to build conceptual models that explain the association between the potential predictors and the two outcome variables. This algorithm accounts for data clusters and temporal characteristics of the dataset, utilizing a mixed-effect strategy to combine the potential predictors that influence the outcome variables. Additionally, the algorithm provides a cut-off value for variables, allowing for comparison with clinical experience.

GLMM Tree performance metrics were Area Under the Curve of Sensibility-Specificity (AUC), the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC),22 as well as the deviance, the likelihood statistical,23 and the sensitivity and specificity parameters. All the regression and GLMM Tree models were fitted with the same subset of variables shown in Table 1.

We used both regressions and GLMM family trees to gain a wider understanding of potential predictors for IMV and ICU mortality. This combined approach offers more intuitive decision-making compared to black-box modeling strategies. We assessed each predictor's effectiveness and used the same set of variables (Table 2) to build an ICU mortality model for the entire cohort. The study's anonymized database and scripts can be found on the associated GitHub repository.24 The database will be published in PhysioNet25 project in order to disseminate and exchange the anonymized clinical records looking for cooperative project replication.

ResultsPatient characteristics

The complete cohort consisted of 280 patients who were included in the study. A total of 154 patients (55 %) required IMV after ICU admission (Fig. 1), 65 of 80 patients (81.2 %) during the first and 89 of 200 patients (44.5 %) during the second wave. ICU mortality of the whole cohort was 25.7% (72 of 280 patients), 33.7% (27 of 80 patients) during the first and 22.5% (45 of 200 patients) in the second wave. Table 2 shows IMV and ICU mortality predictors for the whole patient’s cohort. Mean registers per patient was 43.4, for a total of 12,163 hourly registers in the whole database (Figure 12 in complementary material).

Significance of predictors

R-Part classification achieves the best and most clinically plausible results in selecting the twelve most representative predictors for IMV and ICU mortality from the whole group of available potential predictors (Table 2). Concerning this subset of predictors, the final selection is based on decreasing order of importance, according to results reached by the loss function (mean squared error), scaled from 0 to 100 points. Taking into account this premise, the predictors are: days from first symptoms to ICU admission (100), the APACHE II score (92.25), the oxygenation index, ROX index (72.46), blood procalcitonin (69.59), serum lactic dehydrogenase (54.45), total serum bilirubin (36.54), the COVID-19 wave (31.18), the dose of corticosteroids administered during the first five days of admission (30.96), lymphocyte count (15.57), pH (13.29), BMI (12.76), C-reactive protein (12.74), time to oxygen therapy (12.42) and body temperature (10.82).

Modeling performance

In Table 3, the performance of the IMV model is presented. The R-part predictors Regression-GLMTREE pair achieved the highest performance with an AUROC of 0.87, as shown in Figure 8 in the Supplementary material. Additionally, the ICU mortality model performed well, with an AUROC of 0.88, as demonstrated in Figure 9 in the Supplementary material. The IMV likelihood ratio (RV+ 3.16, RV- 0.177) suggests that the test result is moderately useful for identifying or discharge patients susceptible to being treated with IMV. Related to the CI (95%), the reached interval (0.918 and 0.928) suggests a high level of precision considering the sensitivity, specificity, and accuracy of the model. Related to ICU mortality, the IMV likelihood ratio (RV+ 5,105, RV− 0.424) and CI (95%) interval (0.817 and 0.833), results are also moderately useful. Fig. 3 illustrates the ICU Mortality decision tree, while Figure 7 in the Supplementary material presents the IMV decision tree. The optimal cut-off point for the prediction model was determined based on the IMV and ICU mortality AUC, using Youden's Index,26 which identifies the point of maximum sum of sensitivity and specificity in ROC curve analysis.

Table 3.

IMV Results.

GLMM (Generalized Linear Mixed Model) trees results
Mechanical ventilation
Regressions  Nº Predictors  AUC  C.I(95%)  AIC  BIC  Deviance  Log Lik  Sensitivity  Specificity  LR+  LR− 
Ridge criteria  25  0.852  0.859−0.872  9263.82  9493.41  9201.82  −4600.91  0.856  0.689  2.75  0.206 
LASSO criteria  22  0.852  0.847–0.862  9263.82  94,939.41  9201.82  −4.600.91  0.856  0.852  5.78  0.166 
Elastic criteria  20  0.858  0.847–0.862  9111.20  9325.98  9053.20  −4526.60  0.750  0.816  4.07  0.308 
Boruta criteria  32  0.897  0.862–0.875  7775.23  8004.82    −3856.61  0.858  0.800  4.29  0.177 
R-Part criteria  13  0.867  0.918–0.928  7830.28  8059.87  7758.28  −3884.14  0.871  0.725  3.16  0.177 

The Akaike Information Criterion (AIC) reports the information score of the whole models: the smaller the AIC value, the better the model fit. AIC is calculated from the number of independent variables to build the model and the maximum likelihood estimate of the model (how well the model reproduces the data). The best-fit model according to AIC is the one that explains the greatest amount of variation using the fewest possible independent variables. Bayesian information criterion (BIC) is another criteria for model selection that measures the trade-off between model fit and complexity of the model. A lower AIC or BIC value indicates a better fit. The log-likelihood (log Lik) value of a regression model is a way to measure the goodness of fit for a model. The higher the value of the log-likelihood, the better a model fits a dataset. Deviance is a goodness-of-fit metric for statistical models, particularly used for GLMs. It is defined as the difference between the Saturated and Proposed Models and can be thought as how much variation in the data does our Proposed Model account for. Therefore, the lower the deviance, the better the model. Sensitivity is the metric that evaluates a model's ability to predict true positives of each available category. Specificity is the metric that evaluates a model's ability to predict true negatives of each available category. The higher value of sensitivity would mean higher value of true positive and lower value of false negative. For the healthcare domain, models with high sensitivity will be desired. Specificity is the metric that evaluates a model's ability to predict true negatives of each available category. These metrics apply to any categorical model. Specificity is defined as the proportion of actual negatives, which got predicted as the negative (or true negative). Specificity is a measure of the proportion of people not suffering from the disease who got predicted correctly as the ones who are not suffering from the disease. In other words, the person who is healthy actually got predicted as healthy. The likelihood ratio is often used in statistical hypothesis testing and model selection to compare the fit of different models to the observed data. It is also commonly used in medical diagnostic testing to evaluate the diagnostic accuracy of a particular test or combination of tests. LR+ (likelihood ratio positive) is a statistical measure used to evaluate the diagnostic accuracy of a medical test. It is the ratio of the probability of a positive test result given the presence of the disease to the probability of a positive test result given the absence of the disease. In other words, the LR+ compares the likelihood of a positive test result in patients with the disease versus the likelihood of a positive test result in patients without the disease. In our case, a high LR+ indicates that the test is more accurate at correctly identifying patients how could need IMV, while a low LR+ suggests that the test is not providing strong evidence for IMV. By the way, LR− compares the likelihood of a negative test result in patients with the disease versus the likelihood of a negative test result in patients without the disease. A low LR- indicates that the test is more accurate at correctly identifying patients without the need of IMV, while a high LR- suggests that the test is not providing strong evidence for the absence of IMV. The LR+ and LR− are often used in conjunction with other measures of diagnostic accuracy, such as sensitivity, specificity to assess the overall performance of a medical test. It can help clinicians and researchers determine the optimal use of a particular test in diagnosing a disease or condition. CI stands for "confidence interval." A confidence interval CI is a range of values that is likely to contain the true value of a population parameter (such as a mean or a proportion), with a certain degree of confidence (usually expressed as a percentage, such as 95% or 99%). A narrower interval indicates greater precision, while a wider interval indicates greater uncertainty. The exact range of a "good" CI can vary depending on the context and the specific research question, but typically, a narrower interval is preferred as it provides a more precise estimate. In the case of the area under the receiver operating characteristic curve (AUROC), which is commonly used in binary classification problems, a CI that includes a value of 0.5 (indicating no discrimination between the two groups) is generally considered to be uninformative. On the other hand, a CI that does not include 0.5 and has a range of, for example, 0.7–0.8, may be considered good, indicating that the model has reasonably good discriminative ability. However, the interpretation of the AUROC and its associated CI should always be considered in the context of the specific research question and the particular field of study.

Figure 3.

ICU Mortality Tree Predictors. The predictors appear in different branches attending to their significance in the predictive model. Values in bold letters represent the registries per branch. Values in red bold letters represent the percentage of registries with positive outcome. The variable named as “DAYS_SIMPTONS_ADMISION” is related with the number of days from first symptoms to ICU admission. The variable “linf_total”, is related to lymphocyte count per mm3. The variable named as “dosis_equiv_mpred_5d” is related with the corticosteroid dose, during the first five days of admission (mg of equivalent methylprednisolone dose). The variable named as “bbTot” is related with the total levels of bilirubin in blood. The variable names as “ldh” is related to the lactate dehydrogenase serum level. The variable DAYS_UNTIL_O2 is related to the number of days until the patient requires O2.


The trees in Figures 6 and 7 of the Supplementary material indicate that oxygenation status (ROX index) has the most significant influence on IMV, with a threshold near 5.2. On the other hand, ICU mortality is mainly influenced by comorbidities (APACHE II score) and LDH, as revealed by the same trees.


The results of the present study include some highly relevant clinical results. First, the variable sets predicting IMV, and ICU mortality are different. Whereas oxygenation variables are independent predictors of IMV, ICU mortality is associated with increased age and LDH and the presence of comorbidities. The latter variables may be considered markers of two processes: COVID-19-associated inflammation and ICU-acquired superinfection (see Figure 4 in the Supplementary material). Secondly, the characteristics of pharmacological therapy, including the administration of steroid drugs, has little influence on both the need for IMV and ICU mortality, considering our results. We included in the analysis 64 patients not receiving steroids and 216 receiving this treatment, at the usual 6mg dexamethasone or equivalent daily dose. This is a remarkable finding, because the effect of steroids on mortality identified in a previous trial27 have influenced recommendations, as well as clinical practice, since its publication. It may be speculated that the decision to include and randomize or not at the discretion of the attending physicians, and based on undisclosed criteria, rendered different results by selecting a study subset of COVID-19 cases with different characteristics. In comparison, no inclusion-exclusion criteria for selection process were applied in our “pragmatic” type of cohort. Steroids were given to almost every patient unless a severe contraindication existed, after the results of the RECOVERY trial were made available.

The present study applied a novel methodology (logistic regression with regularization plus GLMM Tree mixed models) to evaluate the relative importance of several variables as predictors of significant clinical events. Using machine learning and a fine-grained longitudinal multifaceted database, we have established relevant variable value thresholds to support clinical decisions. Although the model would perform quite well as predictor for IMV and ICU mortality, with good positive predictive values, it is important to emphasize that this is not a predictive model in the classical sense, but an attempt to pinpoint the most important clinical events that represent turning points during the studied process (in this case, clinical management of patients not initially under IMV). In this sense, we should say that the inclusion of the likelihood ratio as an evaluation factor for comparing performance model was reach great results. However, following the premise of model explainability, we believe it is important to take this element into account as a final selection factor for the set of predictors that best fit daily clinical practice. This study demonstrates that predictor-ranking methodologies using self-explainable machine learning may support therapeutic decision-making using observational data, when randomized clinical trials are unfeasible or unethical.

Regarding with the strengths of our study, we would like to mention the quantity and quality of the data set. Collected data have a high level of detail, leveraging the power of strategically devised electronic health records (EHR), which include relevant information in a highly structured and recoverable format. Every effort was made to configure our EHR to optimally gather all relevant information about COVID-19 patients. Also, our anonymized database is available in the repository along with the script we used for statistical analysis, is highly detailed and has been extensively curated to reflect temporal evolution and to improve data quality as much as possible. In any case, the collection of variables from Electronic Health Records (EHR) may be biased, affecting data quality. Age and gender biases are possible, as well as biases related to the selection and measurement of clinical variables. These biases can lead to incomplete or skewed representations of certain population groups and may impact the validity and generalizability of research findings and clinical decision-making. It is important to be aware of these biases to ensure proper interpretation and use of EHR data.

On the other hand, the limitations of our study results relate mainly to its single-centered nature and require confirmation in a multicenter dataset to gain external validity. Our methodology would be perfectly suited for a multicenter study, including “center” as a random factor in the second (GLMM Tree) part of the process. We suggest that future research applying this methodology could focus on designing clinical studies using observational data to answer relevant clinical questions without the logistic requirements of a randomized clinical trial or for hypothesis-generating purposes. Furthermore, when considering the limitations of using generalized linear mixed effects models (GLMMs) for modeling causation in critical care medicine research, it is important to highlight the absence of explicit causality assumptions. GLMMs primarily focus on association or correlation analysis, lacking the ability to address the assumptions necessary for establishing causal relationships. Specifically, GLMMs do not provide frameworks for the identification of causal effects or account for unmeasured confounding variables, which are crucial considerations in causal inference. In contrast, causal inference methods, such as the potential outcomes framework, explicitly address these assumptions, offering a more comprehensive approach for investigating causality. Therefore, when establishing causal relationships between variables, researchers should carefully consider the limitations of GLMMs and opt for causal inference methods, which provide a more robust approach for investigating causality in critical care medicine research.

In conclusion, different variables predict IMV and ICU mortality in severe COVID-19 patients, suggesting that the therapeutic decision of when to use IMV has little impact on ICU mortality. Our methodology is a valid option to assess therapeutic decisions using observational data when randomized clinical trials are not feasible or ethical.

Author's contribution

SM, MA and AN conceived the presented idea. SM and MA contributed equally as first authors. SM and MA developed the theory and performed the computations. AN conducted an independent literature search to identify potentially relevant studies. MS independently reviewed the search results to identify pertinent articles. MS, AN, TF and VY contributed to the interpretation of the results. SM, MA, AN, MS, FL and AC took the lead in writing the manuscript. All authors provided critical feedback and helped shape the research, analysis, and manuscript.


This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflict of interest

The authors declare that they have no conflict of interest.



Appendix A
Supplementary data

The following is Supplementary data to this article:

K. Rajdev, A. Spanel, S. McMillan, S. Lahan, B. Boer, J. Birge, et al.
Pulmonary barotrauma in COVID-19 patients with ARDS on invasive and non-invasive positive pressure ventilation.
Intensive Care Med, 36 (2021), pp. 1013-1017
Á Estella, P. Vidal-Cortés, A. Rodríguez, D. Andaluz Ojeda, I. Martín-Loeches, E. Díaz, et al.
Management of infectious complications associated with coronavirus infection in severe patients admitted to ICU.
Med Intensiva Engl Ed, 45 (2021), pp. 485-500
N. Chebotareva, S. Berns, T. Androsova, S. Moiseev.
Risk factors for invasive and non-invasive ventilatory support and mortality in hospitalized patients with COVID-19.
Med Intensiva, 46 (2022), pp. 355-356
E. Papoutsi, V.G. Giannakoulis, E. Xourgia, C. Routsi, A. Kotanidou, I.I. Siempos.
Effect of timing of intubation on clinical outcomes of critically ill patients with COVID-19: a systematic review and meta-analysis of non-randomized cohort studies.
Crit Care, 25 (2021), pp. 121
A.K. Chomistek, C. Liang, M.C. Doherty, C.R. Clifford, R.P. Ogilvie, R.V. Gately, et al.
Predictors of critical care, mechanical ventilation, and mortality among hospitalized patients with COVID-19 in an electronic health record database.
BMC Infect Dis, 22 (2022), pp. 413
M.D. Aldhoayan.
The role of artificial intelligence and machine learning during the Covid-19 pandemic: a review.
Stud Health Technol Inform, 295 (2022), pp. 28-32
G.K. Rajbahadur, S. Wang, G.A. Oliva, Y. Kamei, A.E. Hassan.
The impact of feature importance methods on the interpretation of defect classifiers.
IEEE Trans Softw Eng, 48 (2022), pp. 2245-2261
J.O. Ogutu, T. Schulz-Streeck, H.P. Piepho.
Genomic selection using regularized linear regression models: ridge regression, LASSO, elastic net and their extensions.
M. Poddar, G. Harigovind.
Mixed-effects model for classification and prediction in longitudinal data analysis.
2018 International Conference on Bioinformatics and Systems Biology (BSB), (2018), pp. 36-39
M. Fokkema, N. Smits, A. Zeileis, T. Hothorn, H. Kelderman.
Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.
Behav Res Methods, 50 (2018), pp. 2016-2034
H. Seibold, T. Hothorn, A. Zeileis.
Generalised linear model trees with global additive effects.
Adv Data Anal Classif, 13 (2019), pp. 703-725
J. Lever, M. Krzywinski, N. Altman.
Points of significance: model selection and overfitting.
Nat Methods, 13 (2016), pp. 703-705
R. Tibshirani.
Regression shrinkage and selection via the LASSO.
J R Stat Soc Ser B Methodol, 58 (1996), pp. 267-288
A.E. Hoerl, R.W. Kennard.
Ridge regression: biased estimation for nonorthogonal problems.
Technometrics, 12 (1970), pp. 55-67
H. Zou, T. Hastie.
Regularization and variable selection via the elastic net.
J R Stat Soc Ser B Stat Methodol, 67 (2005), pp. 301-320
M.B. Kursa, A. Jankowski, W.R. Rudnicki.
Boruta – a system for feature selection.
Fundam Inform, 101 (2010), pp. 271-285
C. Strobl, J. Malley, G. Tutz.
An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Psychol Methods, 14 (2009), pp. 323-348
A.Y. Ng.
Feature selection, L1 vs. L2 regularization, and rotational invariance.
Proceedings of the twenty-first international conference on Machine learning [Internet], Association for Computing Machinery, (2004), pp. 78
M.B. Kursa, W.R. Rudnicki.
Feature selection with the boruta package.
J Stat Softw, 36 (2010), pp. 1-13
M. Kuhn.
Building predictive models in R using the caret package.
J Stat Softw, 28 (2008), pp. 1-26
R.O. Deliberato, S. Ko, M. Komorowski, M.A. Armengol de La Hoz, M.P. Frushicheva, J.D. Raffa, et al.
Severity of illness scores may misclassify critically ill obese patients.
Crit Care Med, 46 (2018), pp. 394-400
S.I. Vrieze.
Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).
Psychol Methods, 17 (2012), pp. 228-243
S. Chatterjee, N. Frohner, L. Lechner, R. Schöfbeck, D. Schwarz.
Tree boosting for learning EFT parameters.
Comput Phys Commun, 277 (2022),
Muñoz Lezcano S. Git Hub Code Repository for covid-19 project [Internet]. GitHub.; [Accessed 4th January 2023].
A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark, et al.
PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.
Circulation, 101 (2000), pp. e215-e220
R. Fluss, D. Faraggi, B. Reiser.
Estimation of the Youden index and its associated cutoff point.
Biom J, 47 (2005), pp. 458-472
RECOVERY Collaborative Group, P. Horby, W.S. Lim, J.R. Emberson, M. Mafham, J.L. Bell, et al.
Dexamethasone in Hospitalized Patients with Covid-19.
N Engl J Med, 384 (2021), pp. 693-704

Co-first authors: Authors contributed equally as first authors.

Copyright © 2023. Elsevier España, S.L.U. and SEMICYUC
Medicina Intensiva (English Edition)
Article options
Supplemental materials
es en

¿Es usted profesional sanitario apto para prescribir o dispensar medicamentos?

Are you a health professional able to prescribe or dispense drugs?