What is a spurious correlation
I'll just add some additional comments about causality as viewed from an epidemiological perspective. Most of these arguments are taken from Practical Psychiatric Epidemiology. by Prince et al. (2003).
Causation, or causality interpretation. are by far the most difficult aspects of epidemiological research. Cohort and cross-sectional studies might both lead to confoundig effects for example. Quoting S. Menard (Longitudinal Research. Sage University Paper 76, 1991), H.B. Asher in Causal Modeling (Sage, 1976) initially proposed the following set of criteria to be fulfilled:
- The phenomena or variables in question must covary, as indicated for example by differences between experimental and control groups or by nonzero correlation between the two variables.
- The relationship must not be attributable to any other variable or set of variables, i.e. it must not be spurious, but must persist even when other variables are controlled, as indicated for example by successful randomization in an experimental design (no difference between experimental and control groups prior to treatment) or by a nonzero partial correlation between two variables with other variable held constant.
- The supposed cause must precede or be simultnaeous with the supposed effect in time, as indicated by the change in the
cause occuring no later than the associated change in the effect.
While the first two criteria can easily be checked using a cross-sectional or time-ordered cross-sectional study, the latter can only be assessed with longitudinal data, except for biological or genetic characteristics for which temporal order can be assume without longitudinal data. Of course, the situation becomes more complex in case of a non-recursive causal relationship.
I also like the following illustration (Chapter 13, in the aforementioned reference) which summarizes the approach promulgated by Hill (1965) which includes 9 different criteria related to causation effect, as also cited by @James. The original article was indeed entitled "The environment and disease: association or causation?" (PDF version ).
Finally, Chapter 2 of Rothman's most famous book, Modern Epidemiology (1998, Lippincott Williams & Wilkins, 2nd Edition), offers a very complete discussion around causation and causal inference, both from a statistical and philosophical perspective.
I'd like to add the following references (roughly taken from an online course in epidemiology) are also very interesting:
Finally, this review offers a larger perspective on causal modeling, Causal inference in statistics: An overview (J Pearl, SS 2009 (3)).Source: stats.stackexchange.com