Title: Splitting a predictor at the upper quarter or third and
the lower quarter or third
Authors: Andrew Gelman, David Park
Entrydate: 2007-07-06 17:47:48
Keywords: discretization, linear regression, statistical
communication, trichotomizing
Abstract: A linear regression of $y$ on $x$ can be approximated by
a simple difference: the average values of $y$ corresponding to the
highest quarter or third of $x$, minus the average values of $y$
corresponding to the lowest quarter or third of $x$. A simple
theoretical analysis shows this comparison performs reasonably well,
with 80\%--90\% efficiency compared to the linear regression if the
predictor is uniformly or normally distributed. Discretizing $x$
into three categories claws back about half the efficiency lost by
the commonly-used strategy of dichotomizing the predictor.
We illustrate with the example that motivated this research: an
analysis of income and voting which we had originally performed for a
scholarly journal but then wanted to communicate to a general
audience.
http://polmeth.wustl.edu/retrieve.php?id=697