Brief peek into confounding variables

Prashant Kumar
2 min readFeb 22, 2021

I have come across the word confounding variables several times when reading epidemiological studies but rarely when reading a ‘conventional’ machine learning paper. This is because the crux of most the epidemiological studies is to identify the true association between risk(s) and outcome(s), and hence it is very important to understand if there is an underlying variable (not originally considered as a risk in the study) that has associations with both risk and outcome. Note that, I am not an Epidemiologist by training but work in the healthcare domain.

Confounding variables are the variables which have associations with both risk and outcome but are not intermediate steps between the risk and the outcome. For example, it is common knowledge that smoking (risk) can lead to lung cancer (outcome). But how strongly do we think about the below-mentioned questions?

  1. Is there an association between smoking and drinking alcohol?
  2. Can drinking alcohol lead to lung cancer?
  3. Is drinking alcohol an intermediate step between smoking and lung cancer?

Assuming we have enough data to answer these questions and the answers to these questions are yes, yes, and no respectively, then drinking alcohol is a confounding variable in this study.

There are alternate ways to identify a confounding variable.

(A) A variable, X is a confounding variable if it satisfies these three constraints:

(i) There should be an association between X and the risk factor, Y in the control group. This can be tested by performing a chi-squared test.

(ii) There should be an association between X and the outcome, Z in the absence of Y. This can be tested by calculating the odds-ratio (OR).

(iii) X should not be an intermediate step between the risk and the outcome.

(B) Test of association by stratifying on X: Stratify the data on X and calculate the Odds-ratio between Y and Z. If the OR between Y and Z are similar between the stratified samples and significantly different than the crude OR, then X is a confounding variable in this study.

It is important to review the literature to dig out the list of possible confounding variables. Controlling for confounding variables in any epidemiology study is key to reaching to the right conclusions.

Please leave your feedback in the comment section below. Also, the views mentioned in this post are solely mine and not influenced by the organization or institution I am affiliated to.

Reference: Validity and Bias in Epidemiology, Coursera

--

--