Cluster analysis for demand model selection
Wednesday, July 9th, 2008[First, a note. The title kind of oversells the post. This isn't publishable material or about econometric theory at all, it's just about something I do in practice based on statistical intuition and some graduate-level knowledge of economic theory. I did spend my entire two years in graduate school (a real MSc. program, not an MBA) doing econometrics, which is why I'm comfortable appealing to statistical and economic intuition, not to mention aware of some huge gaping caveats. On the other hand, I don't know anything about the asymptotics of clustering algorithms (though I do know how algorithms work and that some asymptoptic theory for k-means is well-established). The thing is, I don't have time to do proper theoretical research right now. So this is all seat-of-the-pants advice.]
So, you’re stuck with the universal problem of econometrics: you don’t know how to specify your model — how to choose the equation you’ll estimate.
Maybe they’ve told you they want demand estimates, so accurate prediction takes priority over accurate elasticity estimation (though they do want elasticity estimates so they can do back-of-the-envelope calculations). You can’t just let unknown factors slip into the error on a FWL theorem invocation — you want good R^2, and not in an artificial way.
So you go — demand is a function of price, income and other stuff.
Now, we all know from our microeconomics courses (the real ones, that work you through Mas-Collel) that you can’t just go summing individual demands into a consistent (functional, where for every x there’s one and only one y) demand curve, unless you have all these fancy constraints on utility functions — most notably separability into a Gorman polar form.
