POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Ryan D. Enos" <[log in to unmask]>
Reply To:
Political Methodology Society <[log in to unmask]>
Date:
Mon, 28 Sep 2009 18:09:21 -0400
Content-Type:
multipart/mixed
Parts/Attachments:
text/plain (2528 bytes) , census_block_extract.py (3130 bytes)
Dear Paul,
if your student can do a little programming, I have used scripts likes 
the one attached in the past to do it.  There are actually more 
efficient ways to do it with proprietary geocoders.  After matching the 
addresses with the census blocks, they can then be matched in a 
straight-forward manner with data downloaded from the census.
please let me know if I can clarify anything.
best,
Ryan

Paul Gronke wrote:
> Colleagues
>
> I have a senior writing a thesis who would like to track changing 
> patterns of campaign donations.  He's interested in replicating and 
> extending Michael Malbin's work on how patterns of donations have 
> changed over the past few cycles.
>
> I would like him to attach Census demographic information to the FEC 
> campaign donations dataset (actually, he's downloaded these from open 
> secrets).
>
> I am not familiar with how you attach a Census tract or block 
> identifier to an address, and if this can be done in a relatively 
> straightforward (and not overly expensive) way.  The student is fairly 
> adept with Stata but not tremendously so.  This is something I'm not 
> averse to learning myself, but I am hoping to find an off the shelf 
> solution, or perhaps someone that has already done with with the 2008, 
> 2004, and 2000 files and might be willing to share a dataset.
>
> Thank you
> Paul G.
>
> ---
> Paul Gronke               Ph: 503-517-7393
> Professor                    Fax:
> Reed College
> 3203 SE Woodstock Blvd.
> Portland OR 97202
> http://www.reed.edu/~gronkep
>
> **********************************************************
>             Political Methodology E-Mail List
>   Editors: Xun Pang        <[log in to unmask]>
>            Jon C. Rogowski <[log in to unmask]>
> **********************************************************
>        Send messages to [log in to unmask]
>  To join the list, cancel your subscription, or modify
>           your subscription settings visit:
>
>          http://polmeth.wustl.edu/polmeth.php
>
> **********************************************************


-- 
Ryan D. Enos

Department of Political Science
UCLA

http://ryandenos.com


**********************************************************
             Political Methodology E-Mail List
   Editors: Xun Pang        <[log in to unmask]>
            Jon C. Rogowski <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

**********************************************************



##census_block_extract.py ###extracts addresses from csv and calls census.gov to find block group. ####KEY METHODS: writing to database, sendnig complex URL's to server, placeholders #####RdE October 2007 ##import the necessary modules. from urllib import * import time import urllib2 from time import localtime import csv import re import cookielib ##################################################################### ##baseurl will be passed to http:// baseurl = """http://factfinder.census.gov/servlet/DTGeoAddressServlet?""" urlopen = urllib2.urlopen ## these 6 lines are copied from the web, an e.g. of how to get around cookies cj = cookielib.LWPCookieJar() Request = urllib2.Request opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) urllib2.install_opener(opener) page = urlopen("http://factfinder.census.gov/servlet/DTGeoSearchByListServlet?ds_name=DEC_2000_SF1_U&_lang=en&") ## preliminary site that search goes through f = open("unique_addresses.csv","rU") fout = file("unique_addresses_blocks.csv","w") outcsv = csv.writer(fout,lineterminator="\n") outcsv.writerow(['count','tract','blockGroup','block','residenceaddressline1','residencecityusps', 'residencestate', 'residencezipcode']) counter = -1 print 'Initializing loop...' for row in f: ##loop over every row returned from postgres row = row.strip() row = row.split(",") counter = counter +1 print counter address = row[0].strip() address = address.replace(" ","+") city = row[1] zip = row[3] ###some streets do not contain a direction (N,S,E,W) so a if/else statement must be entered to skip that part of the input if no direction is present #if row[2] !=None: # address = row[0]+"+"+row[2]+"+"+street+"+"+row[4] ##this will be added to the url to call the site #else: print address print city print zip print ['This query and URL call was executed at PDT:', localtime()[3:6]] post = """IS_ADDRESS_VALID=N&IS_GEO_FOUND=N&street=%s&city=%s&states=Florida&zip=%s&_programYear=50&_treeId=4001&_lang=en&_stateSelectedFromDropDown=California&all_geo_types=N""" post = post % (address, city, zip) url = baseurl ##create new url, open, and read try: page = urlopen(url,post) content = page.read() print "open" except: print "\tCan't Open URL!" print "\n" continue county = re.compile("(?<=County:\s)(.*)(?=</option>)") county = re.search(county,content) tract = re.compile("(?<=Census Tract\s)(\d+\.*\d*)") tract = re.search(tract,content) blockGroup = re.compile("(?<=Block Group\s)(\d+)") blockGroup = re.search(blockGroup,content) block = re.compile("(?<=Block\s)(\d+)") block = re.search(block, content) if tract !=None: aggregation = [county.group(),tract.group(),blockGroup.group(),block.group()] print aggregation print "\n" output = [aggregation[0],aggregation[1],aggregation[2],aggregation[3],row[0],row[1],row[2],row[3]] else: print "no match" print "\n" output = ['NA','NA','NA','NA',row[0],row[1],row[2],row[3]] outcsv.writerow(output) fout.close() print 'Closing the connection. Be sure to validate results.'

ATOM RSS1 RSS2