LISTSERV - POLMETH Archives - LISTSERV.WUSTL.EDU

POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

	LISTSERV Archives
	POLMETH Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Addresses --> Census data
From:	"Ryan D. Enos" <[log in to unmask]>
Reply To:	Political Methodology Society <[log in to unmask]>
Date:	Mon, 28 Sep 2009 18:09:21 -0400
Content-Type:	multipart/mixed
Parts/Attachments:	text/plain (2528 bytes) , census_block_extract.py (3130 bytes)

Dear Paul,
if your student can do a little programming, I have used scripts likes 
the one attached in the past to do it.  There are actually more 
efficient ways to do it with proprietary geocoders.  After matching the 
addresses with the census blocks, they can then be matched in a 
straight-forward manner with data downloaded from the census.
please let me know if I can clarify anything.
best,
Ryan

Paul Gronke wrote:
> Colleagues
>
> I have a senior writing a thesis who would like to track changing 
> patterns of campaign donations.  He's interested in replicating and 
> extending Michael Malbin's work on how patterns of donations have 
> changed over the past few cycles.
>
> I would like him to attach Census demographic information to the FEC 
> campaign donations dataset (actually, he's downloaded these from open 
> secrets).
>
> I am not familiar with how you attach a Census tract or block 
> identifier to an address, and if this can be done in a relatively 
> straightforward (and not overly expensive) way.  The student is fairly 
> adept with Stata but not tremendously so.  This is something I'm not 
> averse to learning myself, but I am hoping to find an off the shelf 
> solution, or perhaps someone that has already done with with the 2008, 
> 2004, and 2000 files and might be willing to share a dataset.
>
> Thank you
> Paul G.
>
> ---
> Paul Gronke               Ph: 503-517-7393
> Professor                    Fax:
> Reed College
> 3203 SE Woodstock Blvd.
> Portland OR 97202
> http://www.reed.edu/~gronkep
>
> **********************************************************
>             Political Methodology E-Mail List
>   Editors: Xun Pang        <[log in to unmask]>
>            Jon C. Rogowski <[log in to unmask]>
> **********************************************************
>        Send messages to [log in to unmask]
>  To join the list, cancel your subscription, or modify
>           your subscription settings visit:
>
>          http://polmeth.wustl.edu/polmeth.php
>
> **********************************************************


-- 
Ryan D. Enos

Department of Political Science
UCLA

http://ryandenos.com


**********************************************************
             Political Methodology E-Mail List
   Editors: Xun Pang        <[log in to unmask]>
            Jon C. Rogowski <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

**********************************************************



##census_block_extract.py

###extracts addresses from csv and calls census.gov to find block group.  

####KEY METHODS: writing to database, sendnig complex URL's to server, placeholders

#####RdE October 2007



##import the necessary modules. 

 

from urllib import *

import time

import urllib2

from time import localtime

import csv

import re

import cookielib

#####################################################################



##baseurl will be passed to http:// 

baseurl = """http://factfinder.census.gov/servlet/DTGeoAddressServlet?"""



urlopen = urllib2.urlopen   ## these 6 lines are copied from the web, an e.g. of how to get around cookies

cj = cookielib.LWPCookieJar()

Request = urllib2.Request

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

urllib2.install_opener(opener)

page = urlopen("http://factfinder.census.gov/servlet/DTGeoSearchByListServlet?ds_name=DEC_2000_SF1_U&_lang=en&") ## preliminary site that search goes through



f = open("unique_addresses.csv","rU")



fout = file("unique_addresses_blocks.csv","w")

outcsv = csv.writer(fout,lineterminator="\n")



outcsv.writerow(['count','tract','blockGroup','block','residenceaddressline1','residencecityusps', 'residencestate', 'residencezipcode'])



counter = -1

print 'Initializing loop...'

for row in f: ##loop over every row returned from postgres

	row = row.strip()

	row = row.split(",")

	counter = counter +1

	print counter
	address = row[0].strip() 

	address = address.replace(" ","+")
	city = row[1]
	zip = row[3]	    
    ###some streets do not contain a direction (N,S,E,W) so a if/else statement must be entered to skip that part of the input if no direction is present

    #if row[2] !=None:

    #    address = row[0]+"+"+row[2]+"+"+street+"+"+row[4] ##this will be added to the url to call the site

    #else:

	print address

	print city

	print zip

	print ['This query and URL call was executed at PDT:', localtime()[3:6]] 

	post = """IS_ADDRESS_VALID=N&IS_GEO_FOUND=N&street=%s&city=%s&states=Florida&zip=%s&_programYear=50&_treeId=4001&_lang=en&_stateSelectedFromDropDown=California&all_geo_types=N"""

	post = post % (address, city, zip)

	url = baseurl ##create new url, open, and read

	try:

		page = urlopen(url,post)

		content = page.read()

		print "open"

	except:

		print "\tCan't Open URL!"

		print "\n"

		continue

	
	county = re.compile("(?<=County:\s)(.*)(?=</option>)")
	county = re.search(county,content)	

	tract = re.compile("(?<=Census Tract\s)(\d+\.*\d*)")

	tract = re.search(tract,content)

	blockGroup = re.compile("(?<=Block Group\s)(\d+)")

	blockGroup = re.search(blockGroup,content)

	block = re.compile("(?<=Block\s)(\d+)")

	block = re.search(block, content)

	if tract !=None:

		aggregation = [county.group(),tract.group(),blockGroup.group(),block.group()]

		print aggregation

		print "\n"

		output = [aggregation[0],aggregation[1],aggregation[2],aggregation[3],row[0],row[1],row[2],row[3]]

	else:

		print "no match"

		print "\n"

		output = ['NA','NA','NA','NA',row[0],row[1],row[2],row[3]]



	outcsv.writerow(output)



fout.close()

print 'Closing the connection. Be sure to validate results.'

ATOM RSS1 RSS2

LISTSERV.WUSTL.EDU