Friday, October 17, 2014

Waldman on "Econometrics, open science, and cryptocurrency"

Econometrics, open science, and cryptocurrency by Steve Randy Waldman

Mark Thoma wrote the wisest two paragraphs you will read about econometrics and empirical statistical research in general:
You are testing a theory you came up with, but the data are uncooperative and say you are wrong. But instead of accepting that, you tell yourself "My theory is right, I just haven't found the right econometric specification yet. I need to add variables, remove variables, take a log, add an interaction, square a term, do a different correction for misspecification, try a different sample period, etc., etc., etc." Then, after finally digging out that one specification of the econometric model that confirms your hypothesis, you declare victory, write it up, and send it off (somehow never mentioning the intense specification mining that produced the result). 
Too much econometric work proceeds along these lines. Not quite this blatantly, but that is, in effect, what happens in too many cases. I think it is often best to think of econometric results as the best case the researcher could make for a particular theory rather than a true test of the model.
What Thoma is describing here cannot be fixed. Naive theories of statistical analysis presume a known, true model of the world whose parameters a researcher need simply to estimate. But there is in fact no "true" model of the world, and a moralistic prohibition of the process Thoma describes would freeze almost all empirical work in its tracks. It is the practice of good researchers, not just of charlatans, to explore their data. If you want to make sense of the world, you have to look at it first, and try out various approaches to understanding what the data means. In practice, that means that long before any empirical research is published, its producers have played with lots and lots of potential models. They've examined bivariate correlations, added variables, omitted variables, considered various interactions and functional forms, tried alternative approaches to dealing with missing data and outliers, etc. It takes iterative work, usually, to find even the form of a model that will reasonably describe the space you are investigating. Only if your work is very close to past literature can you expect to be able to stick with a prespecified statistical model, and then you are simply relying upon other researchers' iterative groping.
...

No comments: