6 Comments

Nice post. When reading the last paragraph I recalled Miguel Hernan's phrase on EdX Causal Diagrams course: "Draw Your Assumptions Before Your Conclusions"

Expand full comment

Adam, I am sorry I am "late". Thank you for another deep thinking. Adam, given the potential of missing information on modelling assumption as well as on potential numerous assumption on models investigating same topics, how would one reconcile; how would one aggregate these? would I2 be enough?

Expand full comment

Excellent post. Regarding just one part of it I find time after time that models make the best descriptive statistics. For example to understanding missing data I fit a logistic regression model to predict the probability that a variable is missing on a subject, or to predict the number of missing variables that a subject has.

Expand full comment

Nice post Adam. I’m really interested in the ‘Random but unbalanced’ section, which has a lot of parallels to my post last week on ‘random confounding’ (https://tpmorris.substack.com/). I just can’t get a handle on the effect of adjustment in your context. My post noted Aickin’s result that balancing any observed variable in RCT design immediately improves balance in unobserved variables (and I’m told there is a parallel result for adjustment via weighting/other in analysis in Rosenbaum & Rubin’s original propensity score paper). Why would things be different when estimating a simple proportion? I can’t see that adjusting for age would really unbalance the other characteristic to the extent of introducing bias? (I haven’t yet read Andrew Mercer’s ‘Parallels between causal inference and survey inference’, which may contain the answer.)

PS – Just delighted to see that you en-dashed ‘Clopper–Pearson’ instead of hyphenating it like most people do!

Expand full comment

Good question! I was trying to think of a pathological example where we have a sample that happens to be biased on one irrelevant variable (i.e. which is uncorrelated with characteristic) but is balanced on what matters (i.e. correlated with characteristic). My understanding from your nice piece last week was that correlation between the observed and unobserved variables is key ("if an unmeasured covariate is correlated – negatively or positively – with the stratifying covariate, balancing the measured covariate also better balances the unmeasured covariate")? But I was thinking about a situation where they are uncorrelated.

Off the top of my head, perhaps a clearer toy example would be something where the characteristic being adjusted for is more clearly irrelevant, and age matters but is ignored. E.g. a population of 10 people, 6 who have short names and 4 long names, and 50% of each group are young (i.e. 3/6 short names, 2/4 long names). Say we sample 4 people from this population and get 1 short name person who is young and 1 long name person who is young and 2 long name people who are old. Our sample is balanced by age, but imbalanced by name length. So if we weight the sample to balance the name lengths, we'd create an imbalance by age.

But as you suggest in your post, the potential for bias may be limited in practice (so the above example is more of a 'could happen if we make ridiculous assumptions' rather than 'likely to happen'...)

Expand full comment

Yep, your example makes sense! It confuses me because in the case I was talking about one can do no worse than if you don't balance a covariate. Aickin's result was important because people always worried that they might be hurting the trial. In your case of a simple random sample, it's clearly possible to do worse… I wonder what the difference is.

Expand full comment