>Date: Tue, 17 Apr 2007 10:27:50 -0400
>From: "Franzese, Robert" <[log in to unmask]>
>Subject: Significance of parameter vs. of effects in non-linear-additive models
>like logit/probit
>
>Fellow PolMethers,
>
>I write with (yet another) simple-but-very-good-and(-perhaps)-deep
>question that the students in my methods class this term raised. They
>noticed, in the course of a problem set, that a coefficient in a logit
>estimation could be statistically distinguishable from zero while
>marginal or first-difference effects of the associated variable are not,
>or vice versa the former may be insignificant but the latter significant
>(in some or all ranges of other variable values). As we all know, in a
>logit model,
>
>d(p-hat)/dX_j = beta_j*(phat)*(1-phat)
>
>My first thought was: Are these close calls? The tests that beta_j=0 and
>that beta_j*phat*(1-phat)=0 might be asymptotically equivalent
To calculate the distribution of a function of a "asymptotically
stable" random variable Z_n (that is, where a_n (Z_n - b) approaches X
in law, a_n being a sequence of constants tending to infinity), note
a_n (g(Z_n) - g(b)) approaches g'(b)X in law if g is differentiable
(this is a corollary of Slutsky's theorem given in standard texts such
as Bickel & Doksum (A.14.17--page 461 in the first edition)). This
can be used to test asymptotic equivalency of functions of the mle
estimators--see below.
>Date: Tue, 17 Apr 2007 09:11:50 -0700
>From: Douglas Rivers <[log in to unmask]>
>Subject: Re: Significance of parameter vs. of effects in non-linear-additive mo
>like logit/probit
>
>I don't think this is correct.
>
>First, as far as the hypotheses are concerned, they are equivalent since
>b = 0 iff bp(1-p) = 0 (because 0 < p < 1). Thus, the only thing that can
>differ between the tests is that one has more power than the other.
>
>What you have are two (asymptotic) t-tests, where t_1 = \hat b/SD(\hat
>b) and t_2 = \hat b \hat p (1 - \hat p) / SD(\hat b \hat p (1 - \hat
>p)). I think, in practice, that the first will tend to be larger than
>the second, though I couldn't determine whether the ratio t_1/t_2 is
>always greater than one. However, I think a Rao-Blackwell argument
>should show that the first is always better (since you're effectively
>just adding some positively correlated noise to the estimate by
>multiplying by \hat p (1 - \hat p). At least, that's my intuition.
Letting f(b) = [b_j, b_j p(1-p)] and noting that the p are functions
of the b, then f(\hat{b}) is distributed asympotically normal with
mean [b_j, b_j p(1-p)] and variation (df/db)A(df/db)', where A is the
asymptotic covariance matrix of \hat{b} and (df/db) is the matrix of
partial derivatives of f wrt b. This then gives the joint
distribution of the \hat b_j and \hat b_j \hat p (1 - \hat p) and
allows statistical inferences to be drawn about these random functions
(or any other functions one might wish to construct by altering the f
matrix) in the usual manner.
The reference to Rao-Blackwell appears not to be germane, as
that theorem provides results on the conditioning of an estimator by a
sufficient statistic, and no conditioning is taking place here (the
wikipedia entry for Rao-Blackwell is competently done and is at URL:
http://en.wikipedia.org/wiki/Rao-Blackwell_theorem).
Ken McCue
|