Dear Paul,
if your student can do a little programming, I have used scripts likes
the one attached in the past to do it. There are actually more
efficient ways to do it with proprietary geocoders. After matching the
addresses with the census blocks, they can then be matched in a
straight-forward manner with data downloaded from the census.
please let me know if I can clarify anything.
best,
Ryan
Paul Gronke wrote:
> Colleagues
>
> I have a senior writing a thesis who would like to track changing
> patterns of campaign donations. He's interested in replicating and
> extending Michael Malbin's work on how patterns of donations have
> changed over the past few cycles.
>
> I would like him to attach Census demographic information to the FEC
> campaign donations dataset (actually, he's downloaded these from open
> secrets).
>
> I am not familiar with how you attach a Census tract or block
> identifier to an address, and if this can be done in a relatively
> straightforward (and not overly expensive) way. The student is fairly
> adept with Stata but not tremendously so. This is something I'm not
> averse to learning myself, but I am hoping to find an off the shelf
> solution, or perhaps someone that has already done with with the 2008,
> 2004, and 2000 files and might be willing to share a dataset.
>
> Thank you
> Paul G.
>
> ---
> Paul Gronke Ph: 503-517-7393
> Professor Fax:
> Reed College
> 3203 SE Woodstock Blvd.
> Portland OR 97202
> http://www.reed.edu/~gronkep
>
> **********************************************************
> Political Methodology E-Mail List
> Editors: Xun Pang <[log in to unmask]>
> Jon C. Rogowski <[log in to unmask]>
> **********************************************************
> Send messages to [log in to unmask]
> To join the list, cancel your subscription, or modify
> your subscription settings visit:
>
> http://polmeth.wustl.edu/polmeth.php
>
> **********************************************************
--
Ryan D. Enos
Department of Political Science
UCLA
http://ryandenos.com
**********************************************************
Political Methodology E-Mail List
Editors: Xun Pang <[log in to unmask]>
Jon C. Rogowski <[log in to unmask]>
**********************************************************
Send messages to [log in to unmask]
To join the list, cancel your subscription, or modify
your subscription settings visit:
http://polmeth.wustl.edu/polmeth.php
**********************************************************
##census_block_extract.py
###extracts addresses from csv and calls census.gov to find block group.
####KEY METHODS: writing to database, sendnig complex URL's to server, placeholders
#####RdE October 2007
##import the necessary modules.
from urllib import *
import time
import urllib2
from time import localtime
import csv
import re
import cookielib
#####################################################################
##baseurl will be passed to http://
baseurl = """http://factfinder.census.gov/servlet/DTGeoAddressServlet?"""
urlopen = urllib2.urlopen ## these 6 lines are copied from the web, an e.g. of how to get around cookies
cj = cookielib.LWPCookieJar()
Request = urllib2.Request
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
page = urlopen("http://factfinder.census.gov/servlet/DTGeoSearchByListServlet?ds_name=DEC_2000_SF1_U&_lang=en&") ## preliminary site that search goes through
f = open("unique_addresses.csv","rU")
fout = file("unique_addresses_blocks.csv","w")
outcsv = csv.writer(fout,lineterminator="\n")
outcsv.writerow(['count','tract','blockGroup','block','residenceaddressline1','residencecityusps', 'residencestate', 'residencezipcode'])
counter = -1
print 'Initializing loop...'
for row in f: ##loop over every row returned from postgres
row = row.strip()
row = row.split(",")
counter = counter +1
print counter
address = row[0].strip()
address = address.replace(" ","+")
city = row[1]
zip = row[3]
###some streets do not contain a direction (N,S,E,W) so a if/else statement must be entered to skip that part of the input if no direction is present
#if row[2] !=None:
# address = row[0]+"+"+row[2]+"+"+street+"+"+row[4] ##this will be added to the url to call the site
#else:
print address
print city
print zip
print ['This query and URL call was executed at PDT:', localtime()[3:6]]
post = """IS_ADDRESS_VALID=N&IS_GEO_FOUND=N&street=%s&city=%s&states=Florida&zip=%s&_programYear=50&_treeId=4001&_lang=en&_stateSelectedFromDropDown=California&all_geo_types=N"""
post = post % (address, city, zip)
url = baseurl ##create new url, open, and read
try:
page = urlopen(url,post)
content = page.read()
print "open"
except:
print "\tCan't Open URL!"
print "\n"
continue
county = re.compile("(?<=County:\s)(.*)(?=</option>)")
county = re.search(county,content)
tract = re.compile("(?<=Census Tract\s)(\d+\.*\d*)")
tract = re.search(tract,content)
blockGroup = re.compile("(?<=Block Group\s)(\d+)")
blockGroup = re.search(blockGroup,content)
block = re.compile("(?<=Block\s)(\d+)")
block = re.search(block, content)
if tract !=None:
aggregation = [county.group(),tract.group(),blockGroup.group(),block.group()]
print aggregation
print "\n"
output = [aggregation[0],aggregation[1],aggregation[2],aggregation[3],row[0],row[1],row[2],row[3]]
else:
print "no match"
print "\n"
output = ['NA','NA','NA','NA',row[0],row[1],row[2],row[3]]
outcsv.writerow(output)
fout.close()
print 'Closing the connection. Be sure to validate results.'
|