POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Richard Sherman <[log in to unmask]>
Reply To:
Political Methodology Society <[log in to unmask]>
Date:
Fri, 19 Apr 2013 19:02:51 +0900
Content-Type:
text/plain
Parts/Attachments:
text/plain (273 lines)
Manoel,

I agree with you on most points.

I think of "big" as more than a million observations. And yes, usually I have far, far, far fewer than that. 

It's just that big data is the subject here, and as you (and I) say, R is not the best tool for that.

Anyway, I doubt that the list has much interest in pursuing this discussion. Those who wish, please just write me.

-Richard

---
Prof. Richard Sherman
Division of International Studies
Korea University

On Apr 19, 2013, at 8:29 AM, Manoel Galdino <[log in to unmask]> wrote:

> I'm curious. How big is your big? Does you data fit in memory?
> 
> In any case, you're right that R has a steeper learning curve. I guess it's
> a trade-off between low productivity in the beginning vs freedom to
> implement things in your own way. But I doubt that Stata is better than R
> if you know the R way to do things. For instance, why use melt - and - cast
> when you have data.table package? data.table is faster than SQL if the data
> fits in memory and you use it the right way! Also, it saves a lot of memory
> by avoiding unnecessary copy of objects that are default in base R.
> 
> I'm not saying R is the best tool for analysis of big data. It's not
> (yet?). But my feeling is that in academy, big data isn't really big data
> and R is better than most options.
> 
> Best,
> Manoel
> 
> 
> 
> 
> 
> On Thu, Apr 18, 2013 at 6:27 PM, Richard Sherman <[log in to unmask]> wrote:
> 
>> Hello Patrick,
>> 
>> The likelihood that my weeks-long R episode is due to human (Sherman)
>> error cannot be overlooked.
>> 
>> I suppose this is partly a question of cultural/linguistic preferences
>> over software. Still, with big data in Stata you can write
>> 
>> use bigdata
>> reshape long y, i(x) j(z)
>> reg y v*
>> 
>> and expect results before the day ends.
>> 
>> To do the same thing in R, you need to -melt- and -cast-, which can take
>> days, then Google all over to find the right "big" package, and wait until
>> next Thursday to get what you need.
>> 
>> I like R for many reasons, but the analysis of big data is not one of them.
>> 
>> -Richard
>> 
>> ---
>> Prof. Richard Sherman
>> Division of International Studies
>> Korea University
>> 
>> On Apr 18, 2013, at 5:25 AM, Patrick Lam <[log in to unmask]> wrote:
>> 
>>> Hi Richard,
>>> 
>>> That is interesting.  My experience is that on the surface, Stata handles
>>> bigger datasets more smoothly due to the way R handles and processes its
>>> data.  But there are almost always packages that allow R to process big
>>> data in a way that is as efficient as Stata, although one has to look for
>>> these packages.  See for example, a recent piece in TPM about the
>> bigmemory
>>> package:
>>> 
>>> http://polmeth.wustl.edu/methodologist/tpm_v20_n1.pdf
>>> 
>>> The difference of weeks versus half an hour to me seems to be so
>>> drastically different that it be a matter of coding.
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Apr 17, 2013 at 3:52 PM, Richard Sherman <[log in to unmask]>
>> wrote:
>>> 
>>>> OK, interesting, but:
>>>> 
>>>> I've waited weeks for R to do what Stata can do in half an hour. R is
>> not
>>>> suited to big data.
>>>> 
>>>> -Richard
>>>> 
>>>> ---
>>>> Prof. Richard Sherman
>>>> Division of International Studies
>>>> Korea University
>>>> 
>>>> On Apr 18, 2013, at 3:18 AM, "Mihas, Paul" <[log in to unmask]> wrote:
>>>> 
>>>>> Practical "Big Data": Separating the Hope from the Hype<
>>>> 
>> https://apps.research.unc.edu/events/index.cfm?event=events.eventDetails&event_key=51FE2C8EA7C615597B4111E3B07B274D8C2578E5
>>>>> 
>>>>> 
>>>>> Two-Day Short Course
>>>>> 
>>>>> May 20-21: 10 a.m.-4 p.m.
>>>>> Odum Institute for Research in Social Science<http://www.odum.unc.edu>
>>>>> 14 Manning Hall
>>>>> University of North Carolina, Chapel Hill
>>>>> 
>>>>> Philip A. Schrodt, Pennsylvania State University
>>>>> 
>>>>> Overview: The phrase "Big Data" has come to designate a network of
>>>> relatively new computationally intensive methods that merge machine
>>>> learning and statistical methods for the analysis of very large data
>> sets
>>>> derived from secondary sources, usually the Web. This two-day short
>> course
>>>> will provide an overview of the most commonly used approaches, and how
>>>> these do -- and sometimes do not -- differ from conventional social
>> science
>>>> statistical approaches. The lectures emphasize approaches and resources
>> for
>>>> gaining further knowledge and technical proficiency, rather than going
>> into
>>>> depth on any single method; with very few exceptions, all of the
>> software
>>>> illustrated will be open source.
>>>>> 
>>>>> *   Module 1: Big Data: sources and practical implementation.
>>>> Web-scraping. Hadoop and other distributed databases, "cloud" computing,
>>>> and the "map-reduce" approach. Resources in R and Python. Ethical
>>>> considerations: privacy, intellectual property
>>>>> *   Module 2: Working with unstructured text: regular expressions,
>>>> natural language processing suites for pre-processing text; named entity
>>>> and feature extraction
>>>>> *   Module 3: Working with unstructured text: supervised text
>>>> classification and unsupervised topic models.
>>>>> *   Module 4: Working with large-scale semi-structured data:
>>>> clustering, decision-trees, ensemble methods, and visualization
>>>>> 
>>>>> Pre-requisites: The course assumes a general familiarity with social
>>>> science data analysis and its mathematical conventions (for example the
>>>> equations for regression analysis). Knowledge of some computer
>> programming
>>>> and the R statistical system will be very helpful but not required.
>>>>> 
>>>>> Fee: $420
>>>>> 
>>>>> To register, click here.<
>>>> 
>> https://apps.research.unc.edu/events/index.cfm?event=events.eventDetails&event_key=51FE2C8EA7C615597B4111E3B07B274D8C2578E5
>>>>> 
>>>>> 
>>>>> Odum Institute
>>>>> The University of North Carolina at Chapel Hill
>>>>> Manning Hall CB# 3355
>>>>> Chapel Hill, NC 27599-3355
>>>>> www.odum.unc.edu<http://www.odum.unc.edu>
>>>>> Telephone: 919.962.3061
>>>>> Fax: 919.962.4777
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> **********************************************************
>>>>>           Political Methodology E-Mail List
>>>>> Editors: Ethan Porter        <[log in to unmask]>
>>>>>          Gregory Whitfield   <[log in to unmask]>
>>>>> **********************************************************
>>>>>      Send messages to [log in to unmask]
>>>>> To join the list, cancel your subscription, or modify
>>>>>         your subscription settings visit:
>>>>> 
>>>>>        http://polmeth.wustl.edu/polmeth.php
>>>>> 
>>>>> **********************************************************
>>>> 
>>>> **********************************************************
>>>>            Political Methodology E-Mail List
>>>>  Editors: Ethan Porter        <[log in to unmask]>
>>>>           Gregory Whitfield   <[log in to unmask]>
>>>> **********************************************************
>>>>       Send messages to [log in to unmask]
>>>> To join the list, cancel your subscription, or modify
>>>>          your subscription settings visit:
>>>> 
>>>>         http://polmeth.wustl.edu/polmeth.php
>>>> 
>>>> **********************************************************
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Patrick Lam
>>> Department of Government and Institute for Quantitative Social Science,
>>> Harvard University
>>> http://www.patricklam.org
>>> 
>>> **********************************************************
>>>            Political Methodology E-Mail List
>>>  Editors: Ethan Porter        <[log in to unmask]>
>>>           Gregory Whitfield   <[log in to unmask]>
>>> **********************************************************
>>>       Send messages to [log in to unmask]
>>> To join the list, cancel your subscription, or modify
>>>          your subscription settings visit:
>>> 
>>>         http://polmeth.wustl.edu/polmeth.php
>>> 
>>> **********************************************************
>> 
>> **********************************************************
>>             Political Methodology E-Mail List
>>   Editors: Ethan Porter        <[log in to unmask]>
>>            Gregory Whitfield   <[log in to unmask]>
>> **********************************************************
>>        Send messages to [log in to unmask]
>>  To join the list, cancel your subscription, or modify
>>           your subscription settings visit:
>> 
>>          http://polmeth.wustl.edu/polmeth.php
>> 
>> **********************************************************
>> 
> 
> 
> 
> -- 
> Manoel Galdino
> https://sites.google.com/site/galdinomcz/
> 
> **********************************************************
>             Political Methodology E-Mail List
>   Editors: Ethan Porter        <[log in to unmask]>
>            Gregory Whitfield   <[log in to unmask]>
> **********************************************************
>        Send messages to [log in to unmask]
>  To join the list, cancel your subscription, or modify
>           your subscription settings visit:
> 
>          http://polmeth.wustl.edu/polmeth.php
> 
> **********************************************************

**********************************************************
             Political Methodology E-Mail List
   Editors: Ethan Porter        <[log in to unmask]>
            Gregory Whitfield   <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

**********************************************************

ATOM RSS1 RSS2