Manoel,
I agree with you on most points.
I think of "big" as more than a million observations. And yes, usually I have far, far, far fewer than that.
It's just that big data is the subject here, and as you (and I) say, R is not the best tool for that.
Anyway, I doubt that the list has much interest in pursuing this discussion. Those who wish, please just write me.
-Richard
---
Prof. Richard Sherman
Division of International Studies
Korea University
On Apr 19, 2013, at 8:29 AM, Manoel Galdino <[log in to unmask]> wrote:
> I'm curious. How big is your big? Does you data fit in memory?
>
> In any case, you're right that R has a steeper learning curve. I guess it's
> a trade-off between low productivity in the beginning vs freedom to
> implement things in your own way. But I doubt that Stata is better than R
> if you know the R way to do things. For instance, why use melt - and - cast
> when you have data.table package? data.table is faster than SQL if the data
> fits in memory and you use it the right way! Also, it saves a lot of memory
> by avoiding unnecessary copy of objects that are default in base R.
>
> I'm not saying R is the best tool for analysis of big data. It's not
> (yet?). But my feeling is that in academy, big data isn't really big data
> and R is better than most options.
>
> Best,
> Manoel
>
>
>
>
>
> On Thu, Apr 18, 2013 at 6:27 PM, Richard Sherman <[log in to unmask]> wrote:
>
>> Hello Patrick,
>>
>> The likelihood that my weeks-long R episode is due to human (Sherman)
>> error cannot be overlooked.
>>
>> I suppose this is partly a question of cultural/linguistic preferences
>> over software. Still, with big data in Stata you can write
>>
>> use bigdata
>> reshape long y, i(x) j(z)
>> reg y v*
>>
>> and expect results before the day ends.
>>
>> To do the same thing in R, you need to -melt- and -cast-, which can take
>> days, then Google all over to find the right "big" package, and wait until
>> next Thursday to get what you need.
>>
>> I like R for many reasons, but the analysis of big data is not one of them.
>>
>> -Richard
>>
>> ---
>> Prof. Richard Sherman
>> Division of International Studies
>> Korea University
>>
>> On Apr 18, 2013, at 5:25 AM, Patrick Lam <[log in to unmask]> wrote:
>>
>>> Hi Richard,
>>>
>>> That is interesting. My experience is that on the surface, Stata handles
>>> bigger datasets more smoothly due to the way R handles and processes its
>>> data. But there are almost always packages that allow R to process big
>>> data in a way that is as efficient as Stata, although one has to look for
>>> these packages. See for example, a recent piece in TPM about the
>> bigmemory
>>> package:
>>>
>>> http://polmeth.wustl.edu/methodologist/tpm_v20_n1.pdf
>>>
>>> The difference of weeks versus half an hour to me seems to be so
>>> drastically different that it be a matter of coding.
>>>
>>>
>>>
>>>
>>> On Wed, Apr 17, 2013 at 3:52 PM, Richard Sherman <[log in to unmask]>
>> wrote:
>>>
>>>> OK, interesting, but:
>>>>
>>>> I've waited weeks for R to do what Stata can do in half an hour. R is
>> not
>>>> suited to big data.
>>>>
>>>> -Richard
>>>>
>>>> ---
>>>> Prof. Richard Sherman
>>>> Division of International Studies
>>>> Korea University
>>>>
>>>> On Apr 18, 2013, at 3:18 AM, "Mihas, Paul" <[log in to unmask]> wrote:
>>>>
>>>>> Practical "Big Data": Separating the Hope from the Hype<
>>>>
>> https://apps.research.unc.edu/events/index.cfm?event=events.eventDetails&event_key=51FE2C8EA7C615597B4111E3B07B274D8C2578E5
>>>>>
>>>>>
>>>>> Two-Day Short Course
>>>>>
>>>>> May 20-21: 10 a.m.-4 p.m.
>>>>> Odum Institute for Research in Social Science<http://www.odum.unc.edu>
>>>>> 14 Manning Hall
>>>>> University of North Carolina, Chapel Hill
>>>>>
>>>>> Philip A. Schrodt, Pennsylvania State University
>>>>>
>>>>> Overview: The phrase "Big Data" has come to designate a network of
>>>> relatively new computationally intensive methods that merge machine
>>>> learning and statistical methods for the analysis of very large data
>> sets
>>>> derived from secondary sources, usually the Web. This two-day short
>> course
>>>> will provide an overview of the most commonly used approaches, and how
>>>> these do -- and sometimes do not -- differ from conventional social
>> science
>>>> statistical approaches. The lectures emphasize approaches and resources
>> for
>>>> gaining further knowledge and technical proficiency, rather than going
>> into
>>>> depth on any single method; with very few exceptions, all of the
>> software
>>>> illustrated will be open source.
>>>>>
>>>>> * Module 1: Big Data: sources and practical implementation.
>>>> Web-scraping. Hadoop and other distributed databases, "cloud" computing,
>>>> and the "map-reduce" approach. Resources in R and Python. Ethical
>>>> considerations: privacy, intellectual property
>>>>> * Module 2: Working with unstructured text: regular expressions,
>>>> natural language processing suites for pre-processing text; named entity
>>>> and feature extraction
>>>>> * Module 3: Working with unstructured text: supervised text
>>>> classification and unsupervised topic models.
>>>>> * Module 4: Working with large-scale semi-structured data:
>>>> clustering, decision-trees, ensemble methods, and visualization
>>>>>
>>>>> Pre-requisites: The course assumes a general familiarity with social
>>>> science data analysis and its mathematical conventions (for example the
>>>> equations for regression analysis). Knowledge of some computer
>> programming
>>>> and the R statistical system will be very helpful but not required.
>>>>>
>>>>> Fee: $420
>>>>>
>>>>> To register, click here.<
>>>>
>> https://apps.research.unc.edu/events/index.cfm?event=events.eventDetails&event_key=51FE2C8EA7C615597B4111E3B07B274D8C2578E5
>>>>>
>>>>>
>>>>> Odum Institute
>>>>> The University of North Carolina at Chapel Hill
>>>>> Manning Hall CB# 3355
>>>>> Chapel Hill, NC 27599-3355
>>>>> www.odum.unc.edu<http://www.odum.unc.edu>
>>>>> Telephone: 919.962.3061
>>>>> Fax: 919.962.4777
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> **********************************************************
>>>>> Political Methodology E-Mail List
>>>>> Editors: Ethan Porter <[log in to unmask]>
>>>>> Gregory Whitfield <[log in to unmask]>
>>>>> **********************************************************
>>>>> Send messages to [log in to unmask]
>>>>> To join the list, cancel your subscription, or modify
>>>>> your subscription settings visit:
>>>>>
>>>>> http://polmeth.wustl.edu/polmeth.php
>>>>>
>>>>> **********************************************************
>>>>
>>>> **********************************************************
>>>> Political Methodology E-Mail List
>>>> Editors: Ethan Porter <[log in to unmask]>
>>>> Gregory Whitfield <[log in to unmask]>
>>>> **********************************************************
>>>> Send messages to [log in to unmask]
>>>> To join the list, cancel your subscription, or modify
>>>> your subscription settings visit:
>>>>
>>>> http://polmeth.wustl.edu/polmeth.php
>>>>
>>>> **********************************************************
>>>>
>>>
>>>
>>>
>>> --
>>> Patrick Lam
>>> Department of Government and Institute for Quantitative Social Science,
>>> Harvard University
>>> http://www.patricklam.org
>>>
>>> **********************************************************
>>> Political Methodology E-Mail List
>>> Editors: Ethan Porter <[log in to unmask]>
>>> Gregory Whitfield <[log in to unmask]>
>>> **********************************************************
>>> Send messages to [log in to unmask]
>>> To join the list, cancel your subscription, or modify
>>> your subscription settings visit:
>>>
>>> http://polmeth.wustl.edu/polmeth.php
>>>
>>> **********************************************************
>>
>> **********************************************************
>> Political Methodology E-Mail List
>> Editors: Ethan Porter <[log in to unmask]>
>> Gregory Whitfield <[log in to unmask]>
>> **********************************************************
>> Send messages to [log in to unmask]
>> To join the list, cancel your subscription, or modify
>> your subscription settings visit:
>>
>> http://polmeth.wustl.edu/polmeth.php
>>
>> **********************************************************
>>
>
>
>
> --
> Manoel Galdino
> https://sites.google.com/site/galdinomcz/
>
> **********************************************************
> Political Methodology E-Mail List
> Editors: Ethan Porter <[log in to unmask]>
> Gregory Whitfield <[log in to unmask]>
> **********************************************************
> Send messages to [log in to unmask]
> To join the list, cancel your subscription, or modify
> your subscription settings visit:
>
> http://polmeth.wustl.edu/polmeth.php
>
> **********************************************************
**********************************************************
Political Methodology E-Mail List
Editors: Ethan Porter <[log in to unmask]>
Gregory Whitfield <[log in to unmask]>
**********************************************************
Send messages to [log in to unmask]
To join the list, cancel your subscription, or modify
your subscription settings visit:
http://polmeth.wustl.edu/polmeth.php
**********************************************************
|