POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Gary King <[log in to unmask]>
Reply To:
Gary King <[log in to unmask]>
Date:
Sat, 2 Dec 2006 09:14:07 -0500
Content-Type:
TEXT/PLAIN
Parts/Attachments:
TEXT/PLAIN (142 lines)
good questions, and a good discussion that highlighted a lot of other
issues that might be relevant.  here's an alternative way to think about
some of the more basic issues you've raised.

First, take cross-observation dependence. A likelihood-based model is
L(theta|y_i) = product_{i=1}^n P(y_i|theta). You get to take that product
over the density for the individual observations because you are assuming
they are independent after taking account of the covariates.  that's true
whether P() is negative binomial, Poisson, normal, and anything else.  If
some of the assistant professors are competing with each other, making
this assumption without covariates would be implausible, but with the
right covariates it might be ok.

Second, think about within-observation dependence.  The Poisson
distribution is analogous to the Normal(y|mu,sigma), where sigma is set to
some fixed number and not estimated.  Since your standard errors depend on
sigma being estimated correctly, you'll get badly biased se's unless the
actual variation happens to be the same as that fixed number.  If the
variation is larger than Poisson, its called overdispersed, and the negbin
model would be a plausible model of that (although there are lots of
others).  If its underdispersed, there's a version of the binomial and
generalized event count model (which also allows overdispersion).  But the
main message is that you should estimate sigma and not assume its value.
(Curt Signorino and I make this point in a PA article you can find at
http://gking.harvard.edu/files/abs/generaliz-abs.shtml.)

Third, there is the underlying process that can generate over or
underdispersion within an observation.  As it happens there are two
processes that are observationally equivalent that can both lead to
overdispersion (same for underdispersion).  One is heterogeneity -- for a
particular strata of your covariates there may be different expected
counts for different observations -- and the other is contagion --
finishing one article causes the asst prof to work hard to finish the next
one.  Contagion in this sense has nothing to do with one observation (asst
prof) affecting another.

And finally, what's the practical message?  A good modeling rule is to
always have at least one more parameter than you are interested in, so
make sure sigma is in there.  If the nature of the variation is
over-disperson, then switching to the negative binomial is quite
reasonable.  (Other types of over-disperson models will give you different
estimates, but most give very similar results.) The 2 levels do interact
since the way we estimate within-observation processes is to use the
variation between observations, and to condition on the covariates.  You
will see this since the APPARENT (or estimated) level of overdisperson
(heterogeneity or contagion) will drop as you add covariates to a negative
binomial model.  Of course the real level of heterogeneity or contagion is
not affected, since the model requires the covariates to be specified
correctly.

Gary

---
Gary King,
Institute for Quantitative Social Science
Harvard University, 1737 Cambridge St, Cambridge, MA 02138
http://GKing.Harvard.Edu, [log in to unmask]
Direct 617-495-2027, Assistant 495-9271, eFax 812-8581

On Wed, 29 Nov 2006, Sarah Croco wrote:

> Dear Colleagues,
>
> A coauthor and I recently encountered a bit of uncertainty regarding an
> underlying assumption of the negative binomial regression (NBREG) and
> were wondering if anyone had any advice on how to proceed. Our question
> centers on whether the NBREG model is capable of handling
> interdependence between counts, and, if so, what kind of interdependence
> is it designed to capture?
>
> In several texts authors suggest using an NBREG model instead of a
> Poisson model when overdispersion is present. In examples overdispersion
> is often attributed to one of two causal mechanisms. The first, which we
> call an ?omitted variable effect?, occurs when there is some unobserved
> variable present in the data that makes some units/subjects have higher
> counts than others. A common example is the number of published papers
> an assistant professor produces in a year. We cannot assume the rate of
> publication is constant because professors will vary in their
> productivity for a number of reasons that are specific to each
> individual. A similar example has to do with how well sports teams
> perform across a season. Some teams will score at a higher rate than
> others because of a variable we cannot observe. In these examples, there
> is an interdependence within individual professors and within individual
> teams.
>
> The second causal mechanism could be called ?success breeds success?. In
> this case, the individual counts are not independent of one another
> because success in one period might encourage the subject to make
> another attempt. For example, a successful sales pitch on Wednesday for
> a door-to-door salesman may encourage him to try again on Thursday.
> Another example might be the number of violent episodes mentally ill
> patients undergo in a given year. One hypothesis might be that a violent
> episode in time t leads to an increased probability of a violent episode
> in time t+1 (a cathartic effect is also possible, where a violent
> episode in time t reduces the probability that the patient will undergo
> a violent episode in time t+1). Under this causal mechanism the
> contagion effect or interdependence is across time.
>
> After searching the literature, we are left with two questions.
>
> 1. Are NBREG models meant to handle interdependence? (While there seems
> to be a consensus of ?yes? on this answer, several publications suggest
> the exact opposite. One paper, in fact, went to great lengths to
> demonstrate why and how current NBREG models need to be modified to be
> capable of handling non-independence).
>
> 2. If NBREG models can handle non-independence, which kind of
> non-independence are they meant to handle? Interdependence within
> subjects, where there is some omitted variable that would account for
> why some subjects have higher counts than others or interdependence
> across time where a success in time t leads to a second attempt in time
> t+1? Or both?
>
> Any thoughts or citation suggestions on this matter are greatly appreciated.
>
> Sarah Croco
>
> **********************************************************
>             Political Methodology E-Mail List
>        Editor: Karen Long Jusko <[log in to unmask]>
> **********************************************************
>        Send messages to [log in to unmask]
>  To join the list, cancel your subscription, or modify
>           your subscription settings visit:
>
>          http://polmeth.wustl.edu/polmeth.php
>
> **********************************************************
>

**********************************************************
             Political Methodology E-Mail List
        Editor: Karen Long Jusko <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

********************************************************** 

ATOM RSS1 RSS2