POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jay Ulfelder <[log in to unmask]>
Reply To:
Political Methodology Society <[log in to unmask]>
Date:
Fri, 24 May 2013 15:46:53 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (68 lines)
Dear colleagues,

Philosophically, I'd call myself a Bayesian, but my practical skills at
Bayesian computation kind of suck, so I'm hoping some of you can help me
think about how to tackle the following research problem.

As part of a public early-warning system I'm helping build for the U.S.
Holocaust Memorial Museum's Center for the Prevention of
Genocide<http://www.ushmm.org/genocide/about/>,
I want to use daily event data from
GDELT<http://eventdata.psu.edu/data.dir/GDELT.html>to help assess
whether or not an episode of "mass killing" is underway. For
this task, a mass killing is defined as as any episode in which the actions
of state agents result in the intentional death of at least 1,000
noncombatants from a discrete group in a period of sustained violence.

Thanks to Ben Valentino, I have historical data identifying where and when
episodes of mass killing have occurred. Thanks to Kalev Leetaru, I have
three decades' worth of GDELT data and can expect daily updates to start
streaming in soon.

What I want to do now is build a process that uses the daily event data
from GDELT to update a probabilistic estimate that a mass killing is
already happening. With the statistical techniques I usually use, I would:
1) partition my historical data set of country-days into training and test
sets with stratification on the dependent variable; 2) apply a
machine-learning process like Random Forests to the training set to get an
algorithm that distinguishes between periods with and without mass killing;
3) check the reliability of that algorithm on the test set; 4) iterate
until I thought I had something that worked well; and, finally, 5) start
using it in real time, making further refinements as feedback accumulates.

What that approach would miss, I think, is the Bayesian notion of updating
from a prior instead of treating each new day as if it were independent. I
suppose I could partially solve that problem with a parametric model that
included the previous day's predicted probability as one of the covariates,
but that strikes me as a clumsy way to do it.

I'm guessing there's a standard Bayesian solution to this problem that I
just don't happen to know. If that's right, then I'd greatly appreciate any
pointers you can provide to motivating texts and worked examples. If that's
wrong---or, really, in any case---I'd welcome any other suggestions you
might have about the best way to do this.

Thanks in advance for any help,
Jay

-- 
Jay Ulfelder, Ph.D.
[log in to unmask]
(301) 580-8736 [mobile]
Twitter: @jay_ulfelder <http://twitter.com/#!/jay_ulfelder>
Blog: Dart-Throwing Chimp <http://dartthrowingchimp.wordpress.com/>
SSRN Author Page <http://ssrn.com/author=539102>

**********************************************************
             Political Methodology E-Mail List
   Editors: Ethan Porter        <[log in to unmask]>
            Gregory Whitfield   <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

**********************************************************

ATOM RSS1 RSS2