POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Richard Sherman <[log in to unmask]>
Reply To:
Political Methodology Society <[log in to unmask]>
Date:
Fri, 19 Apr 2013 06:27:14 +0900
Content-Type:
text/plain
Parts/Attachments:
text/plain (182 lines)
Hello Patrick,

The likelihood that my weeks-long R episode is due to human (Sherman) error cannot be overlooked. 

I suppose this is partly a question of cultural/linguistic preferences over software. Still, with big data in Stata you can write

use bigdata
reshape long y, i(x) j(z)
reg y v*

and expect results before the day ends.

To do the same thing in R, you need to -melt- and -cast-, which can take days, then Google all over to find the right "big" package, and wait until next Thursday to get what you need.

I like R for many reasons, but the analysis of big data is not one of them.

-Richard

---
Prof. Richard Sherman
Division of International Studies
Korea University

On Apr 18, 2013, at 5:25 AM, Patrick Lam <[log in to unmask]> wrote:

> Hi Richard,
> 
> That is interesting.  My experience is that on the surface, Stata handles
> bigger datasets more smoothly due to the way R handles and processes its
> data.  But there are almost always packages that allow R to process big
> data in a way that is as efficient as Stata, although one has to look for
> these packages.  See for example, a recent piece in TPM about the bigmemory
> package:
> 
> http://polmeth.wustl.edu/methodologist/tpm_v20_n1.pdf
> 
> The difference of weeks versus half an hour to me seems to be so
> drastically different that it be a matter of coding.
> 
> 
> 
> 
> On Wed, Apr 17, 2013 at 3:52 PM, Richard Sherman <[log in to unmask]> wrote:
> 
>> OK, interesting, but:
>> 
>> I've waited weeks for R to do what Stata can do in half an hour. R is not
>> suited to big data.
>> 
>> -Richard
>> 
>> ---
>> Prof. Richard Sherman
>> Division of International Studies
>> Korea University
>> 
>> On Apr 18, 2013, at 3:18 AM, "Mihas, Paul" <[log in to unmask]> wrote:
>> 
>>> Practical "Big Data": Separating the Hope from the Hype<
>> https://apps.research.unc.edu/events/index.cfm?event=events.eventDetails&event_key=51FE2C8EA7C615597B4111E3B07B274D8C2578E5
>>> 
>>> 
>>> Two-Day Short Course
>>> 
>>> May 20-21: 10 a.m.-4 p.m.
>>> Odum Institute for Research in Social Science<http://www.odum.unc.edu>
>>> 14 Manning Hall
>>> University of North Carolina, Chapel Hill
>>> 
>>> Philip A. Schrodt, Pennsylvania State University
>>> 
>>> Overview: The phrase "Big Data" has come to designate a network of
>> relatively new computationally intensive methods that merge machine
>> learning and statistical methods for the analysis of very large data sets
>> derived from secondary sources, usually the Web. This two-day short course
>> will provide an overview of the most commonly used approaches, and how
>> these do -- and sometimes do not -- differ from conventional social science
>> statistical approaches. The lectures emphasize approaches and resources for
>> gaining further knowledge and technical proficiency, rather than going into
>> depth on any single method; with very few exceptions, all of the software
>> illustrated will be open source.
>>> 
>>> *   Module 1: Big Data: sources and practical implementation.
>> Web-scraping. Hadoop and other distributed databases, "cloud" computing,
>> and the "map-reduce" approach. Resources in R and Python. Ethical
>> considerations: privacy, intellectual property
>>> *   Module 2: Working with unstructured text: regular expressions,
>> natural language processing suites for pre-processing text; named entity
>> and feature extraction
>>> *   Module 3: Working with unstructured text: supervised text
>> classification and unsupervised topic models.
>>> *   Module 4: Working with large-scale semi-structured data:
>> clustering, decision-trees, ensemble methods, and visualization
>>> 
>>> Pre-requisites: The course assumes a general familiarity with social
>> science data analysis and its mathematical conventions (for example the
>> equations for regression analysis). Knowledge of some computer programming
>> and the R statistical system will be very helpful but not required.
>>> 
>>> Fee: $420
>>> 
>>> To register, click here.<
>> https://apps.research.unc.edu/events/index.cfm?event=events.eventDetails&event_key=51FE2C8EA7C615597B4111E3B07B274D8C2578E5
>>> 
>>> 
>>> Odum Institute
>>> The University of North Carolina at Chapel Hill
>>> Manning Hall CB# 3355
>>> Chapel Hill, NC 27599-3355
>>> www.odum.unc.edu<http://www.odum.unc.edu>
>>> Telephone: 919.962.3061
>>> Fax: 919.962.4777
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> **********************************************************
>>>            Political Methodology E-Mail List
>>>  Editors: Ethan Porter        <[log in to unmask]>
>>>           Gregory Whitfield   <[log in to unmask]>
>>> **********************************************************
>>>       Send messages to [log in to unmask]
>>> To join the list, cancel your subscription, or modify
>>>          your subscription settings visit:
>>> 
>>>         http://polmeth.wustl.edu/polmeth.php
>>> 
>>> **********************************************************
>> 
>> **********************************************************
>>             Political Methodology E-Mail List
>>   Editors: Ethan Porter        <[log in to unmask]>
>>            Gregory Whitfield   <[log in to unmask]>
>> **********************************************************
>>        Send messages to [log in to unmask]
>>  To join the list, cancel your subscription, or modify
>>           your subscription settings visit:
>> 
>>          http://polmeth.wustl.edu/polmeth.php
>> 
>> **********************************************************
>> 
> 
> 
> 
> -- 
> Patrick Lam
> Department of Government and Institute for Quantitative Social Science,
> Harvard University
> http://www.patricklam.org
> 
> **********************************************************
>             Political Methodology E-Mail List
>   Editors: Ethan Porter        <[log in to unmask]>
>            Gregory Whitfield   <[log in to unmask]>
> **********************************************************
>        Send messages to [log in to unmask]
>  To join the list, cancel your subscription, or modify
>           your subscription settings visit:
> 
>          http://polmeth.wustl.edu/polmeth.php
> 
> **********************************************************

**********************************************************
             Political Methodology E-Mail List
   Editors: Ethan Porter        <[log in to unmask]>
            Gregory Whitfield   <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

**********************************************************

ATOM RSS1 RSS2