Interesting problem. Here's a first take:
I guess a basic question is the nature of the missingness. If the
village location for an event is missing completely at random then you
could simply leave those events out of the analysis, or impute them on
the basis of a model of the probability that a village experiences a
conflict event based on the events of known location. If missing at
random then the latter model might be rather more complex, but the
modeling principle remains the same. In either case thinking of a
partially observed indicator variable associated with each event
representing the village it occurred in might help formulating the
imputation problem and models for its solution. These thoughts treat
the problem district by district. Taking districts seriously adds
another predictor to the missingness model and may partially pool
parameters in the part of the model that predicts the relationship
between covariates and experiencing a conflict event.
In short, your 'multilevel setup' and 'imputation' intuitions seem
right to me.
Will
On 9 Mar 2009, at 19:52, Cyrus Samii wrote:
> Hi Polmethers,
>
> I have an imputation problem, and I'm wondering if people have ideas
> on what might be a good solution.
>
> We have a list of about 13,000 conflict events and data on about 4,000
> villages. We want to assign conflict events to villages to measure
> each village's conflict exposure level. The villages are grouped into
> 75 districts (so an average of about 50 villages per district).
>
> All of the conflict events can be assigned to a district. Thus, we
> know the district totals. When we go district-by-district, we see
> that in some districts, as many as 85% of conflict events have
> sufficient information to be assigned to villages with certainty;
> whereas in other districts, only 20% of the events can be assigned to
> a village with certainty. Overall, we have certain information that
> we can use to assign 60% of the conflict events to villages. Thus,
> the imputation problem is to figure out how to assign the other 40%
> (i.e. about 5200) of the events, using the district totals and the
> village-level data that we have. (We have data for many village-level
> covariates.)
>
> To give an example, we might know that a district experienced 100
> conflict events. We know in which villages 60 of those events took
> place. But we don't know anything more about the other 40 events.
> How should we allocate them?
>
> I was thinking about using a weighted regression, in which
> village-level event counts are modeled as being a function of village
> covariates. The weights would vary over district and would be
> proportional to the fraction of events that could be assigned with
> certainty in that district. (E.g., villages in districts in which 80%
> of events were assigned get twice the weight as villages in which 40%
> of events were assigned.) But this is based on some fuzzy intuitions
> about accounting for the differences in the information content, and
> isn't really well justified. Also, I don't see how to constrain the
> model to ensure that predicted district-level totals add up to the
> known totals. I am thinking there is some way to use a multilevel
> setup or a constrained regression, but this would be venturing into
> unchartered territory for me, so any hints would be welcome.
>
> Thanks!
>
> Cyrus
>
>
> --
> Cyrus Samii
> Political Science
> Columbia University
> [log in to unmask]
>
> Burundi Survey: www.columbia.edu/~cds81/burundisurvey/
> ISERP Statistical Consulting:
> www.iserp.columbia.edu/services/statistical_consulting.html
> Comparative Political Economy Blog: cpecolumbia.blogspot.com
>
> **********************************************************
> Political Methodology E-Mail List
> Editors: Melanie Goodrich, <[log in to unmask]>
> Xun Pang, <[log in to unmask]>
> **********************************************************
> Send messages to [log in to unmask]
> To join the list, cancel your subscription, or modify
> your subscription settings visit:
>
> http://polmeth.wustl.edu/polmeth.php
>
> **********************************************************
**********************************************************
Political Methodology E-Mail List
Editors: Melanie Goodrich, <[log in to unmask]>
Xun Pang, <[log in to unmask]>
**********************************************************
Send messages to [log in to unmask]
To join the list, cancel your subscription, or modify
your subscription settings visit:
http://polmeth.wustl.edu/polmeth.php
**********************************************************
|