Intensive care units (ICUs) rely in many instances on observational research and often encounter difficulties in establishing cause-and-effect relationships. After conducting a thorough search focused on ICU observational studies, this review analysed the causal language and evaluated the quality of reporting of the methodologies employed. The causal was assessed by analysing the words linking exposure to outcomes in the title and main objective. The quality of the reporting of the key methodological aspects related to causal inference was based on STROBE and ROBINS-I tools. We identified 139 articles, with 87 (63%) and 82 (59%) studies having non-causal language in their title and main objective, respectively. Among the total, 49 (35%) articles directly addressed causality. The review found vague causal language in observational ICU research and highlighted the need for better adherence to reporting guidelines for improved causal analysis and inference.
Las unidades de cuidados intensivos (UCI) dependen en muchas ocasiones de la investigación observacional y, a menudo, encuentran dificultades para establecer relaciones causales. Después de realizar una búsqueda exhaustiva de estudios observacionales en UCI, se analizó el lenguaje causal mediante el análisis de las palabras que vinculan la exposición a los resultados en el título y objetivo principal. La calidad del reporte de los aspectos metodológicos claves relacionados con la inferencia causal se evaluó utilizando las herramientas STROBE y ROBINS-I. Identificamos 139 artículos, con 87 (63%) y 82 (59%) estudios que usaban lenguaje no causal en su título y objetivo principal, respectivamente. De estos, 49 (35%) artículos abordaron directamente causalidad. La revisión encontró un uso vago de lenguaje causal en la investigación observacional en UCI y resaltó la necesidad de mejorar la adherencia a las guías de reporte para mejorar la investigación causal.
A fundamental challenge in health research lies in establishing whether there is a genuine cause-and-effect relationship between exposure and outcome.1 Being stochastic in nature, health research is inexact. Causal inference is best addressed through randomized controlled trials (RCTs), which by virtue of random assignment enable comparisons of groups similar for prognosis at baseline. However, many circumstances render RCTs impractical or unethical.2,3 For example, in the context of intensive care unit (ICU) research, the design and conduct of RCTs becomes challenging due to ethical considerations such as the difficulty of withholding life-saving interventions,4 eligibility restrictions due to patient heterogeneity5,6 and challenges in obtaining informed consent from the critically ill.7,8
In ICU research, observational studies provide a feasible alternative to RCTs. In the past decade, there has been a notable shift in observational studies, with increased emphasis on innovative designs and statistical analyses.9–13 Key modern causal methods include directed acyclic graphs (DAGs),14,15 propensity score methods,16,17 inverse probability treatment weighting,18,19 G-methods,20,21 interrupted time series (ITS),22 instrumental variables (IV),23 and marginal structural models (MSM),24 amongst others. When RCTs have been replicated using observational data with rigorous methodology, the results have been similar.2,25–27
Given the recent improvements in designs and statistics, one might expect greater precision in article language avoiding ambiguous and falsely positive causal inferences. It has been claimed in the past that much of the causal association literature has avoided direct language. It has oscillated between excess caution and exaggeration in suggesting a cause-and-effect relationship.28 The extent to which there has been ambiguity has not been quantified. The objective of this review was to quantify the utilization of correct causal language in recent observational studies in ICU settings. We also assessed the quality of reporting and the methods employed to address causal relationships.
MethodsThis methodological study was prospectively registered at Open Science Forum Registries (DOI: https://doi.org/10.17605/OSF.IO/NZRVT) and is reported following PRISMA 2020 guidelines29 (checklist provided in e-Table 1).
Search and article selectionOur search focused on observational studies evaluating any procedure in ICU setting, published in peer-reviewed journals indexed in the Ovid Medline database between 2019 and 2022. Our search included Medical Subject Heading (MeSH) terms and keywords for causality adapted OVID-Medline (see e-Table 2). We restricted our search to critical care settings using the following terms “critical care unit, intensive care unit, critical care facility, intensive treatment unit, emergency unit, critical room, and ICU or CCU”. Language was restricted to English. All reports of observational studies that included the term “causal” were eligible for inclusion. We excluded research involving cellular or animal models, as well as all types of non-observational studies. Search results were organised using the Rayyan web application for systematic review management. Two reviewers (LdC, AM) identified potentially eligible articles based on their title and abstract. After the reviewers piloted 200 articles, the observed agreement was greater than 90% and the selection was made by a single reviewer (LdC). The same reviewer selected articles based on the full text. We also conducted a manual search scrutinizing the articles published in the same period in all the 35 journals indexed in the category “Critical Care Medicine” of the Journal Citation Reports (JCR).
Data extractionData extraction was performed using a standardized pre-piloted form. We extracted verbatim the sentence containing the word “causal” to analyse the causal intention of the authors. The articles were divided into two groups based on the use of this term. One group consisted of articles where the authors used the word “causal” or synonyms to indicate an intention to address causality (see e-Box 1) directly. The other group included articles acknowledging that the term “causal” could not be appropriately used due to issues related to study design, statistical analysis, etc.
For causal language analysis, we extracted the words linking the exposures to the outcomes from the titles and main objective of the study. We then categorized these linking words according to the definitions of causal language provided by Thapa et al.30 (see e-Box 2).
For the quality of causality assessment, we extracted data with respect to reporting and methodology. We extracted data on reporting of the subset of key methods, results and discussion items related to causality in the STROBE checklist.31 To assess the methodological quality of causal inference, we used the relevant items of the ROBINS-I tool,32 focusing on the dimensions confounding, selection bias, bias due to missing data and bias in the classification of interventions and in the measurement of outcomes. The data extraction form created by combining STROBE and ROBINS-I is presented in e-Table 3. We also extracted information regarding limitations provided in narrative form in the discussion sections of the manuscripts.
Data synthesisWe calculated the percentage for frequency data and mean, standard deviation and range for continuous data for each characteristic of interest. We present the results along with their corresponding 95% confidence intervals. Graphing and statistical analysis were conducted using R version 4.3.1 and Stata 18, respectively. We assessed the accuracy of language usage in title and abstract by comparing articles that directly assessed causality with those that did not, employing a chi-squared test for analysis. In order to assess the quality of reporting and methodology, we only analysed data from studies that directly assessed causality. We focused on these articles because the other group of articles did not explicitly address causal inferences.
ResultsSearch and article selectionFig. 1 summarizes the search and the selection processes. Among the 6876 records retrieved, 319 were selected for full-text assessment and 123 articles were included. Based on the manual search, 16 articles were further included. Altogether, data extraction was conducted in 139 articles.
Most of the studies originated from the United States (55.0%) and Europe (25.7%) and were published in first quartile (Q1) category journals (48% of the total). The median impact factor of the journals was 6.9. In 46 articles (33%), the authors included a statistician/epidemiologist and in 27 articles (19%) they used a reporting guideline. More characteristics of the included articles can be found in Table 1.
Characteristics of included articles the methodological review of causality assessment among observational studies in intensive care.
N¿;=¿;139 (%) | [95%CI] | |
---|---|---|
JIF quartile | ||
Q1 | 67 (48%) | [39%; 57%] |
Q2 | 34 (25%) | [17%; 32%] |
Q3 | 20 (15%) | [9%; 21%] |
Q4 | 9 (6%) | [3%; 12%] |
No indexed | 9 (6%) | [3%; 12%] |
Median journal impact factor [min; max] | 6.9 [0.2; 39.2] | [5.8; 8.0] |
Journals with restrict language policya | 27 (19%) | [13%; 27%] |
World region of corresponding authors | ||
Northern America | 76 (55%) | [46%; 63%] |
Europe | 36 (26%) | [19%; 34%] |
Eastern Asia | 20 (14%) | [9%; 21%] |
Oceania | 5 (4%) | [1%; 8%] |
Western Asia | 2 (1%) | [0.1%; 5%] |
Statistician/epidemiologist in the author list | 46 (33%) | [25%; 42%] |
Using a reporting guideline | 27 (19%) | [13%; 27%] |
Study funding | ||
No external funding | 73 (53%) | [44%; 61%] |
Non-industry funded | 58 (42%) | [33%; 50%] |
Industry-funded | 6 (4%) | [2%; 9%] |
Not clearly stated | 2 (1%) | [0.1%; 5%] |
Conflict of interest | ||
No | 86 (62%) | [53%; 70%] |
Yes | 38 (27%) | [20%; 36%] |
Not reported in the article | 13 (9%) | [5%; 15%] |
Not clearly stated | 2 (1%) | [0.1%; 5%] |
Statistical software | ||
R | 36 (26%) | [19%; 34%] |
SPSS | 24 (17%) | [11%; 24%] |
Stata | 22 (16%) | [10%; 23%] |
More than one | 20 (14%) | [9%; 21%] |
SAS | 19 (14%) | [8%; 21%] |
Not reported | 10 (7%) | [4%; 13%] |
Other | 8 (6%) | [3%; 11%] |
We identified 49 (35%) articles as directly addressing causality and 90 (65%) non-causal articles. The words linking the exposure to the outcome in the title and the main objective were non-causal in 27 (55%) and 28 (57%) studies in the directly causal articles, respectively. In the case of non-causal articles, non-causal language was used in the titles and objectives of approximately 60 articles (67%) and 54 articles (60%), respectively. The term “association” was used in the title in 11 (22%) studies in the group directly addressing causality and in 24 (27%) studies in the group not addressing it (p-value¿;=¿;0.684). The term “effect” and its synonyms were used in 12 (24%) studies in the group directly addressing causality and in 11 (12%) in the group not addressing it (p-value¿;=¿;0.093). We found a similar frequency of the authors’ use of the words linking exposure with outcome in the studies’ main objectives for both groups; the term “association” was used in 17 (35%) studies in the group directly addressing causality and in 34 (38%) studies in the group not addressing it (p-value¿;=¿;0.854). The term “effect” and its synonyms were used in 10 (20%) studies in the group directly addressing causality and in 19 (21%) studies in the group not addressing it (p-value¿;=¿;0.922) (Fig. 2).
Out of the 49 articles directly addressing causality, 31 articles (65%) had statistically significant results. Among these 31 articles, 17 (55%) used accurate causal language in their titles, and 13 (42%) did so in their main objectives. Of the 18 articles with non-significant results, 12 articles (71%) used non-causal language in the title, and 10 (59%) did so in the main objectives. In one article of the group directly addressing causality, the authors did not present results for the main objective. We found a similar frequency among studies that did not focus on causality. Of the 90 articles, 60 (67%) reported statistically significant findings. In 7, researchers did not provide information on their main outcome. In studies with statistically significant results, authors used causal terminology in the titles of 18 articles (30%) and in the main objectives of 22 articles (37%). In the articles with a non-significant result, causal terminology appeared in 11 (48%) of the titles and 12 (52%) of the main objectives. More information is provided in Table 2.
Frequency of words used to link exposure with outcome according to the statistical significance in title and objective sections of the articles included in the methodological review of causality assessment among observational studies in intensive care.
Title | Objective | |||||
---|---|---|---|---|---|---|
Directly addressing causality | Statistically significant | No statistically significant | p-Value | Statistically significant | No statistically significant | p-Value |
N¿;=¿;31 | N¿;=¿;17 | N¿;=¿;31 | N¿;=¿;17 | |||
Causal word, n (%) [95%CI] | 17 (55%) [36%; 72%] | 5 (29%) [10%; 56%] | 0.091 | 13 (42%) [24%; 61%] | 7 (41%) [18%; 67%] | 0.959 |
No causal word, n (%) [95%CI] | 14 (45%) [27%; 64%] | 12 (71%) [44%; 90%] | 18 (58%) [39%; 75%] | 10 (59%) [33%; 82%] | ||
Non-causal | Statistically significant | No statistically significant | p-Value | Statistically significant | No statistically significant | p-Value |
N¿;=¿;60 | N¿;=¿;23 | N¿;=¿;60 | N¿;=¿;23 | |||
Causal word, n (%) [95%CI] | 18 (30%) [19%; 43%] | 11 (48%) [27%; 69%] | 0.127 | 22 (37%) [25%; 50%] | 12 (52%) [31%; 73%] | 0.199 |
No causal word, n (%) [95%CI] | 42 (70%) [57%; 81%] | 12 (52%) [31%; 73%] | 38 (63%) [50%; 75%] | 11 (48%) [27%; 69%] |
The group directly addressing causality included retrospective cohorts (32 articles, 65%) as well as prospective cohorts (15 articles, 31%). Two articles (4%) were classified as mixed because they included both retrospective and prospective data collection. Among the 49 articles, five studies (10%) were designed as a “target trial emulation” by the manuscript authors. Twenty-five (51%) of the studies employed administrative data while 24 (49%) used data collected for research purpose. The terms “real-world data” or “real-world evidence” were used in 7 (14%) of the studies.
In total, 42 articles (86%) acknowledged confounding as a potential bias and addressed it in the statistical methods section (Table 3). In eight (16%) the researchers explored the unmeasured confounding issue. The presence of missing data was treated by authors in 27 studies (55%). Six articles (12%) made assumptions about the type of missing data before dealing with it. Multiple imputation was used to correct this bias in 9 articles (18%), complete cases were used in 7 articles (14%) and 5 articles (10%) used other types of approaches. In 6 articles (12%), the authors did not specify their approach. The selection of covariates was reported in 30 articles (61%), based on prior knowledge in 13 articles (43%) and relying on p-value-based decisions in 6 articles (12%). The construction of a multivariable regression model to address confounding was used in 18 articles (37%). Modern causal analysis such as propensity score-based methods or inverse probability of treatment weighting methods were employed in 26 articles (53%).
Key causality items according to the instrument developed in the methodological review of causality assessment among observational studies in intensive care.
True causal | [95%CI] | |
---|---|---|
N¿;=¿;49 (%) | ||
Effort to address potential sources of bias described in methods | ||
Acknowledge confounding | 42 (86%) | [73%; 94%] |
Acknowledge unmeasured confounding | 8 (16%) | [7%; 30%] |
Missing data reported | 27 (55%) | [40%; 69%] |
Assumptions made | 6 (12%) | [5%; 25%] |
Bias in classification of interventions and outcomes | 27 (55%) | [40%; 69%] |
Evaluated reliability | 6 (12%) | [5%; 25%] |
Statistical methods description | ||
Selection of covariates | 30 (61%) | [46%; 75%] |
Based on prior knowledge | 13 (26%) | [15%; 41%] |
Included a DAG | 7 (14%) | [6%; 27%] |
p-Value-based | 6 (12%) | [5%; 25%] |
Alternative approaches | 4 (8%) | [2%; 20%] |
Adjustment methodology | 44 (90%) | [78%; 97%] |
Regression adjustment | 18 (37%) | [23%; 52%] |
Inverse probability of treatment weighting | 13 (26%) | [15%; 41%] |
Propensity score-based methods | 11 (22%) | [12%; 37%] |
Other | 2 (4%) | [1%; 14%] |
Reporting of the numbers of individuals at each stage of study | ||
Patient’s flow-chart | 22 (45%) | [31%; 60%] |
Give reasons for non-participation at each stage | 40 (82%) | [68%; 91%] |
Reporting of the characteristics of study participants | ||
Baseline characteristics table | 41 (84%) | [70%; 93%] |
Quantification of the sample comparability | 34 (69%) | [55%; 82%] |
Quantification of the sample comparability after adjustment to control confusion | 16 (33%) | [20%; 48%] |
Reporting of other analysis done: sensitivity analysis | ||
Robustness checks with sensitivity analyses | 17 (35%) | [22%; 50%] |
Reporting of the limitations of the studya | ||
Study design | 39 (80%) | [66%; 90%] |
Unmeasured confounder | 38 (77%) | [63%; 88%] |
Data quality | 36 (73%) | [59%; 85%] |
Short follow-up or limited data collected | 26 (53%) | [38%; 67%] |
No generalizability | 20 (41%) | [27%; 56%] |
Regarding the results section, 40 articles (82%) gave reasons for excluding patients and 22 articles (45%) provided a flow chart depicting the number of patients in their respective studies. The table of baseline characteristics was presented in 41 articles (84%), of which 34 studies (69%) reported a quantification measure of similarity between the groups compared. The p-value was used as a pre-adjustment measure of comparability in 23 articles (48%) and the standardized differences in 10 articles (29%). In one article (3%), the authors used a risk ratio as a comparability measure. In 16 articles (33%) the authors reported a post-fitting comparability measure.
Regarding reporting of study limitations, in 39 articles (80%) the authors acknowledged the limitations associated with the characteristics of an observational study. Thirty-eight articles (77%) acknowledged unmeasured confounding as a limitation in their study. The data quality issues, and the short follow-up or limited data collected were also notable concerns highlighted by the authors in 36 (73%) and 26 (53%) articles, respectively. Authors still viewed the lack of generalizability as a limitation in their observational studies in 20 articles (41%).
DiscussionStatement of principal findingsOur systematic evaluation of the use of causal language and its implications among observational studies in the ICU setting revealed the following findings: most of the articles included in our review did not follow to any reporting guidelines during their writing; non-causal terminology was widely used in articles that directly addressed causality and those that did not, regardless of whether the results were statistically significant or not; the key elements for appropriate causal inference in observational studies, such as dealing with missing data and interchangeability and generalizability issues were poorly reported and authors did not give sufficient considerations to methods for addressing the limitations of observational design.
Comparison with other studiesThe STROBE statement was published in 2007, yet only 19% of the articles reviewed utilized this guideline for reporting observational studies. This issue is not unique to ICU studies. Generally, adherence to reporting guidelines in other health research areas is also low.33–35 This raises concerns about the potential under-reporting of the items we identified as crucial for causal analysis.
Aligned with the findings in our review, most articles in the existing literature avoid directly discussing causes and instead use unclear and vague language. Olarte Parra et al.36 conducted a review of 60 studies published in general medical journals with the aim of assessing the consistency of causal statements. In this review, many of the studies presented their conclusions in terms of associations while subtly incorporating causal messages within their findings. Similarly, Haber et al.37 conducted a study to quantify the degree of causality implicit in the words linking exposure to outcomes and its consistency with the conclusions about their findings. They found a disconnection between the causality expressed in technical linking language and the research implications. The use of technical language that is not aligned with research implications may distort the interpretation of findings, hinder decision-making, and diminish transparency, impacting the credibility of research.38,39 To maintain credibility, researchers must prioritize accurate language to convey the true implications of their findings and avoid the “Schrodinger’s causal paradox” where the authors are cautious with their causal language while continuing to offer causal interpretations.40
In the articles of the group directly addressing causality, confounding was considered in most, but less account was taken of unmeasured confounding. The validity of observational research relies heavily on the assumption that all potential confounding factors are adequately measured and accounted for. However, despite methodologies available to assess and quantify the influence of unmeasured confounding on the outcomes,41–43 only eight articles explicitly addressed the analysis of unmeasured confounding in their methods section.
Another crucial aspect of causal analysis is the handling of missing data. Among the group of studies directly addressing causality, about half of them acknowledged the presence of missing data. However, only in six articles (12%) did the authors make assumptions about the mechanisms behind the missing data. Nevertheless, authors employed various approaches to correct biases due to missing data, with multiple imputation being the most commonly used approach to address this issue. This reporting gap is consistent with findings from other studies, underscoring the insufficient reporting and handling of missing data in longitudinal observational studies. In 2004, Burton et al.44 published a review of missing data in cancer prognostic studies and found a deficiency in the reporting of missing covariate. After reviewing 100 articles, they found that only 40% of articles provided information about the method used to handle missing covariate data and only 12 articles would have satisfied their proposed guidelines for the reporting of missing data. In a study conducted in 2012, Karahalios et al.45 reviewed cohort study publications in PubMed. They found that a greater number of articles reported the method employed to address missing data in the analysis. However, many articles still did not report the amount of missing data and the reasons for missingness. Frameworks are available to assist researchers in systematically considering missing data and transparently reporting its potential impact on study outcomes.46 One crucial step in these frameworks is identifying plausible mechanisms behind missingness, which is one of the least reported aspects.
In 2020, Tennant et al.15 conducted a review examining the use of DAGs in applied health research and noted their increasing popularity for identifying confounding variables. However, out of the articles reviewed, only seven (14%) utilized DAGs to visually represent the relationship between exposure and outcome variables. Nevertheless, alongside the use of DAGs, there is a tendency to select confounding variables based on prior knowledge rather than relying solely on the results of univariate analysis. This suggests that researchers are more inclined to incorporate established confounding factors into their study design, emphasizing reliance on prior knowledge rather than solely on statistical associations observed in the initial analysis, which aligns with various recommendations.47 This contrasts with the predominant approach used in 18 articles (37%), which relies on regression model fitting as the primary approach for control of confounding, despite its suboptimal effectiveness for this purpose.17,39,48 The widespread use of multivariate regression as a technique for controlling confounding is in the background of limited evaluation of its effectiveness in reducing confounding. Only 16 articles assessed this aspect using metrics such as Standardized Mean Difference (SMD) before and after adjustment.49
Although other articles may have concluded that the inability to attribute causality in observational research was rarely mentioned in journals,50 in our case, the authors are cautious, highlighting the main characteristics of observational studies as major limitations. This recognition is commendable, researchers need to acknowledge the substantial limitations of observational studies. However, authors should incorporate a consideration regarding the efforts undertaken to ensure unbiased results in their discussions, particularly if they employ robust methodologies to alleviate the effects of some the limitations inherent in observational studies. Distinguishing between an insurmountable limitation inherent to observational studies and a limitation arising from uncertainty in results due to inadequate application of current methods is crucial. Researchers should receive formal training in statistics and research methodology, equipping them with the skills needed to conduct analyses that align with best practices.
Strengths and weaknesses of the studyConsistency and precision of language is crucial in observational research. Even in the absence of explicit statements, a causal conclusion is implicit when the language encourages interventions. This demands authors to give special attention to ensure that every word in the title and main objective are well-thought-through. Avoiding causal language inconsistencies is important because readers ought to be able to trust the conclusion reached regarding causality. To this end, we encourage researchers to adhere to reporting guidelines such as STROBE.
Our article has some limitations. One of the primary limitations of conducting a narrative literature review is the inherent subjectivity in synthesizing findings. Additionally, the broad and general nature of our search criteria resulted in a large volume of articles, making it challenging to manage and thoroughly analyse each piece of literature. We have not been able to include in this review the observational studies where the term “causal” did not appear in the text. Additional research is required to investigate the extent to which causal claims stated in the text are substantiated by the design and methods applied. This necessitates a more in-depth evaluation of the methods used and whether the articles manage to eliminate potential biases present to meet all the necessary assumptions for drawing causal conclusions. Furthermore, in future studies, it would be interesting to include all those articles that conduct causal statistical analyses, regardless of whether they explicitly use the term “causal”. We anticipate that our review serves as a step towards a precise systematic evaluation. This will involve assessing the level of causality implied in the language used in observational ICU research and analysing its consistency with the methods employed in study designs and the results obtained in statistical analysis for causal inference.
ConclusionLanguage consistency and precision are vital in observational research. Even without explicit causal statements, language suggesting interventions can imply causality, and misinterpretations can impact decision-making. Researchers should balance causal language with careful statistical analysis to enhance the clarity and robustness of their findings. Understanding statistical methods and following established guidelines like STROBE will improve the accuracy of future research and contribute to a clearer and more reliable body of knowledge.
CRediT authorship contribution statementLDCA, JZ and AM conceived the study. LDCA and AM developed the research protocol and search strategy. LDCA and AM screened the titles, abstracts, and full texts. LDCA and AGS did the data abstraction. AM and JZ verified the study data. LDCA did the statistical analysis. LDCA, JZ and AM co-wrote the first draft of the manuscript, and JZ, KK, OP and JIPZ reviewed subsequent drafts. JZ and KK provided major revisions to the first draft and reviewed subsequent drafts. All authors had full access to all the study data throughout the review process. All authors contributed to the final edits and agreed to submit the final manuscript.
Ethics declarationNot applicable.
Declaration of Generative AI and AI-assisted technologies in the writing processThe authors declare that they have not used any type of generative artificial intelligence to write this manuscript or to create images, graphics, tables, or their corresponding captions.
FundingThis research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
The data used in this systematic review are derived from publicly available. Readers can access the original articles for detailed information on the data utilized. Access to specific datasets may vary depending on the policies of individual publishers. For further inquiries regarding the data used in this study, please contact the corresponding author.