Clinical investigators need to be wary of confounding and bias in real-world studies; here is what to look for and how to address it

Mar 17, 2023

Clinical investigators use real-world data (RWD) in randomized trials and observational studies to produce real-world evidence (RWE) used to assess the safety and efficacy of an intervention and to inform product development, regulatory approval, and post-market surveillance efforts.

RWD is pulled from a broad range of diverse sources in real-word settings, including from electronic health records (EHRs), claims and billing activity, and product and disease registries. Such date has the potential to enable investigators to cost-effectively expedite and expand insights into the clinical performance of new treatment approaches, molecular diagnostics, and medical devices. Furthermore, regulators, including the U.S. Food and Drug Administration (FDA), are progressively encouraging the use of such data to support regulatory decision-making for drugs and medical devices.

However, despite their information-rich potential, collecting RWD and producing RWE require scientific vigilance. Without adhering to proper study design, real-world studies are unwittingly subject to design errors that can favor one treatment or device over others, leading investigators to inherently flawed inferences. To deliver on the promise of RWD, it is critical for researchers to understand the impact of statistical biases caused by design errors. This article provides an overview of the common types of statistical errors -- confounding, selection bias and information bias -- in real-world studies and approaches for preventing and mitigating these errors.

Common types of confounding and bias

1. Confounding

Confounding refers to a lack of exchangeability -- such as a fair comparison of two or more comparison groups.

Confounding by indication is the type of confounding that arises in pharmacoepidemiologic analyses when comparing treatments administered in clinical practice. This typically arises from nature, or the prescribing patterns, behavior, and preferences of patients and providers in real-world healthcare practice, rather than resulting from avoidable errors in data collection or measurement.

As an example, in their everyday clinical practice, physicians are trained to recommend and prescribe the right treatment for the individual patient, minimizing the risk of adverse outcomes and maximizing potential benefit. As a result, clinicians will naturally prescribe certain treatments for certain types of patients -- such as prescribing a more intensive therapy to a patient who is at highest risk of a given adverse health outcome. This nonrandom prescribing pattern introduces unfairness or a lack of balance in baseline characteristics when comparing patient groups defined by a given treatment and comparator option.

Confounding may bias study results upward or downward -- masking or attenuating an actual association, exaggerating the magnitude of a real association, or demonstrating an apparent association between the treatment and outcome when none exists. For example, this can happen when one group of patients receives the standard of care and another group receives a novel treatment option as a last resort. In this case, when the results are compared, the standard of care will have some advantage in terms of efficacy, because it was given to a healthier patient group.

Another example is the method of determining patient disease status for infectious diseases, which typically begins with a molecular assay that detects the presence of the target organism to determine patient infection status and appropriate follow-up, including to assess the impact of antiretroviral treatment on viral load and subsequently adjust antiretroviral therapy. These molecular assays may vary from lab to lab, including by geographic location, which can lead to a different set of patients progressing through follow-up therapies and procedures, leading to demographic and risk-factor imbalance.

Addressing confounding requires a careful a priori defined strategy that begins in the study design phase. Subject matter experts should consider the study question and enumerate the possible confounders of the study association of interest; clinical expertise and literature reviews are instrumental for this phase.

Conceptual and operational definitions of priority confounders should be drafted, and efforts made to identify whether these data elements are available in the RWD sources. A data source without adequate capture of a priority confounder is likely unfit for the case of attempting to draw valid inferences about the comparative effectiveness of therapies or devices. Additional design phase tactics to reduce the potential for confounding include restriction of the study population -- including narrowing the population through eligibility criteria to reduce the prevalence or range of the given confounder -- and selecting and matching the right treatment or device comparison groups along a key confounder.

In the analysis phase of an RWD study, statistical methods can be applied to adjust the effect estimates. Typically, when using rich RWD sources, multivariable analysis is applied to adjust for multiple confounders (and other variables) simultaneously. Multivariable analysis includes a set of statistical methods via linear or nonlinear regression models.

The propensity score method is another option to address confounding. This method applies statistical probabilities of variables in baseline characteristics to impose balance across comparison groups, including treated and untreated participants. As a reference for how this works in practice based on the example above with infectious disease status, a propensity score model can be built using multivariable analysis and its result used to match patients diagnosed with the target organism using one molecular assay versus another. After matching on the propensity score, investigators should find that groups have similar demographic and risk factor profiles, which improves the exchangeability of the comparison.

2. Selection bias

Selection bias occurs when the selected treatment group for comparison or a control group does not adequately represent the target population. Selection bias can result from a variety of factors, including misalignment in the initial selection of candidates, characteristic differences in the participants selected into the treatment and comparator groups, differential losses in participating candidates during the study or before the end of the designated follow-up period, and missing data.

For example, in longitudinal studies that follow patients over time, some patients may not have follow-up data available and thus are excluded from analyses of efficacy. Selection bias can arise if excluded patients have meaningful demographic, economic, or overall health status differences from included patients that could impact safety and efficacy profiles.

As with confounding, there are several methods to address selection bias in real-world studies, including proper study design, intentional control and treatment selection, and statistical techniques that help account for missing data and differential losses among other such challenges. Studies that carefully consider inclusion and exclusion criteria as comprehensively as possible with respect to available data can better predict the magnitude of selection bias.

3. Information bias

Information bias occurs when key study variables -- exposure, health outcome, or confounders -- are classified or measured inaccurately. Three types of information bias commonly occurring in RWD studies are immortal time bias, temporal bias, and verification bias.

Immortal time bias occurs after a participant enters a study but treatment status is delayed and misclassified, so the outcome cannot be fully known. For example, if a participant is accepted into the study, the outcomes clock should start when the individual begins receiving treatment, not when they were accepted into the study. Similarly, once an individual begins treatment, the effect of that treatment may be delayed, so any events that happen initially must be measured against this possibility.

A statistical method can be used to address this if the treatment start date is known (or can be estimated) and is typically referred to as visit windowing. This is where the actual treatment start date is used to determine baseline and the difference from follow-up dates to treatment start date are used to determine when the follow-up visit occurred.

Temporal bias arises when, for example, data for the control group belongs to a completely different time period than the data observed for the treatment group in the study. The older data could have been collected based on different screening guidelines that could result in a population with different characteristics and not comparable.

Verification bias is seen with diagnostic test studies in which patients who tested positive at baseline are more likely to receive follow-up examinations to confirm the true or actual disease status. Patients with negative test results at baseline are assumed to not have the disease, and generally receive no follow-up. However, there is a small chance that the negative test could be a false negative result.

There are two basic methods to address verification bias that are scientifically sound and resource-efficient. One method involves drawing a random sample among those who tested negative and conducting the same follow-up testing as for those who tested positive to quantify the false-negative rate. The second method compares the diagnostic test performance primarily based on data from patients with positive baseline results and addresses the verification bias with advanced statistical models. For example, to evaluate the clinical performance of an assay to detect cancer, investigators for a medical technology company conducted a noninferiority trial using advanced epidemiological, statistical, and machine-learning methods to compare baseline results of the new test with real-world data from an FDA-approved test. Investigators successfully used formal hypothesis tests and prespecified acceptance criteria to establish the ratio of true positive rates at baseline, and the ratio of those to the ratio of true false-positive rates at baseline.

Fostering good procedural practices

Determining which method to employ to counter potential bias will depend on many factors, including the study objective, patient candidate pool, variables being collected, amount of missing data, and the relationships between variables. In addition, pragmatic and resource constraints are critical to incorporate into the design of workable strategies to assess and limit bias. For example, if a real-world study has a relatively small number of subjects, then matching may not be an optimal strategy because dropping unmatched patients may lead to inadequate statistical power.

However, in all cases, investigators need to carefully consider potential sources of bias when they plan their study, and choose appropriate analytic methods supported with sensitivity analyses to reduce and quantify the potential bias.

In today’s digital world, the ubiquity of real-world data collection has created a wealth of data points from which researchers can build RWE. There is increasing interest and potential for converting RWD into RWE that, through careful design, addressing bias, analysis, and interpretation, can be used to inform healthcare decision making. To help pave the way forward, several initiatives (e.g., HARPER, REPEAT, and the ISPOR/ISPE task force) are underway with the aim to improve the transparency, reproducibility, and validity of real-world studies. In addition, in 2021-2022, the FDA released a set of guidelines for improving the validity of RWD studies for regulatory decision-making, with advice on mitigating bias arising from RWD sources. Investigators are encouraged to look to these and to stay attuned to other forthcoming and rapidly evolving resources for guidance on good practices for real-world studies.

About the authors

Jess Paulus, ScD, is vice president of research at OM1, Inc., and leads a team of scientists with deep expertise in the designs and methods specific to leveraging real-world data (RWD) networks to meaningfully inform clinical and regulatory decision making. Dr. Paulus provides senior scientific oversight regarding study design, analytic approaches, and dissemination strategies for RWD-based studies.

Rachel Bogh is a senior biostatistician at Hologic, Inc. in the diagnostics division. She provides statistical expertise to study design, prepares sample size and power calculations, analysis plans, statistical reports, exploratory statistical programming, and acts as a key contributor to interactions with regulatory bodies.

Melissa Dsouza is a biostatistician at Hologic, Inc. in the diagnostics division. She is responsible for assisting with the design of IVD (in-vitro diagnostics) clinical trials, drafting statistical analysis plans for IVD clinical trials, author/review statistical sections of the clinical study protocol and clinical study report. She also assists other groups within Hologic with analysis for assay validation studies and a variety of ad-hoc statistical analysis.