POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Manoel Galdino <[log in to unmask]>
Reply To:
Political Methodology Society <[log in to unmask]>
Date:
Thu, 18 Apr 2013 20:29:31 -0300
Content-Type:
text/plain
Parts/Attachments:
text/plain (241 lines)
I'm curious. How big is your big? Does you data fit in memory?

In any case, you're right that R has a steeper learning curve. I guess it's
a trade-off between low productivity in the beginning vs freedom to
implement things in your own way. But I doubt that Stata is better than R
if you know the R way to do things. For instance, why use melt - and - cast
when you have data.table package? data.table is faster than SQL if the data
fits in memory and you use it the right way! Also, it saves a lot of memory
by avoiding unnecessary copy of objects that are default in base R.

I'm not saying R is the best tool for analysis of big data. It's not
(yet?). But my feeling is that in academy, big data isn't really big data
and R is better than most options.

Best,
Manoel





On Thu, Apr 18, 2013 at 6:27 PM, Richard Sherman <[log in to unmask]> wrote:

> Hello Patrick,
>
> The likelihood that my weeks-long R episode is due to human (Sherman)
> error cannot be overlooked.
>
> I suppose this is partly a question of cultural/linguistic preferences
> over software. Still, with big data in Stata you can write
>
> use bigdata
> reshape long y, i(x) j(z)
> reg y v*
>
> and expect results before the day ends.
>
> To do the same thing in R, you need to -melt- and -cast-, which can take
> days, then Google all over to find the right "big" package, and wait until
> next Thursday to get what you need.
>
> I like R for many reasons, but the analysis of big data is not one of them.
>
> -Richard
>
> ---
> Prof. Richard Sherman
> Division of International Studies
> Korea University
>
> On Apr 18, 2013, at 5:25 AM, Patrick Lam <[log in to unmask]> wrote:
>
> > Hi Richard,
> >
> > That is interesting.  My experience is that on the surface, Stata handles
> > bigger datasets more smoothly due to the way R handles and processes its
> > data.  But there are almost always packages that allow R to process big
> > data in a way that is as efficient as Stata, although one has to look for
> > these packages.  See for example, a recent piece in TPM about the
> bigmemory
> > package:
> >
> > http://polmeth.wustl.edu/methodologist/tpm_v20_n1.pdf
> >
> > The difference of weeks versus half an hour to me seems to be so
> > drastically different that it be a matter of coding.
> >
> >
> >
> >
> > On Wed, Apr 17, 2013 at 3:52 PM, Richard Sherman <[log in to unmask]>
> wrote:
> >
> >> OK, interesting, but:
> >>
> >> I've waited weeks for R to do what Stata can do in half an hour. R is
> not
> >> suited to big data.
> >>
> >> -Richard
> >>
> >> ---
> >> Prof. Richard Sherman
> >> Division of International Studies
> >> Korea University
> >>
> >> On Apr 18, 2013, at 3:18 AM, "Mihas, Paul" <[log in to unmask]> wrote:
> >>
> >>> Practical "Big Data": Separating the Hope from the Hype<
> >>
> https://apps.research.unc.edu/events/index.cfm?event=events.eventDetails&event_key=51FE2C8EA7C615597B4111E3B07B274D8C2578E5
> >>>
> >>>
> >>> Two-Day Short Course
> >>>
> >>> May 20-21: 10 a.m.-4 p.m.
> >>> Odum Institute for Research in Social Science<http://www.odum.unc.edu>
> >>> 14 Manning Hall
> >>> University of North Carolina, Chapel Hill
> >>>
> >>> Philip A. Schrodt, Pennsylvania State University
> >>>
> >>> Overview: The phrase "Big Data" has come to designate a network of
> >> relatively new computationally intensive methods that merge machine
> >> learning and statistical methods for the analysis of very large data
> sets
> >> derived from secondary sources, usually the Web. This two-day short
> course
> >> will provide an overview of the most commonly used approaches, and how
> >> these do -- and sometimes do not -- differ from conventional social
> science
> >> statistical approaches. The lectures emphasize approaches and resources
> for
> >> gaining further knowledge and technical proficiency, rather than going
> into
> >> depth on any single method; with very few exceptions, all of the
> software
> >> illustrated will be open source.
> >>>
> >>> *   Module 1: Big Data: sources and practical implementation.
> >> Web-scraping. Hadoop and other distributed databases, "cloud" computing,
> >> and the "map-reduce" approach. Resources in R and Python. Ethical
> >> considerations: privacy, intellectual property
> >>> *   Module 2: Working with unstructured text: regular expressions,
> >> natural language processing suites for pre-processing text; named entity
> >> and feature extraction
> >>> *   Module 3: Working with unstructured text: supervised text
> >> classification and unsupervised topic models.
> >>> *   Module 4: Working with large-scale semi-structured data:
> >> clustering, decision-trees, ensemble methods, and visualization
> >>>
> >>> Pre-requisites: The course assumes a general familiarity with social
> >> science data analysis and its mathematical conventions (for example the
> >> equations for regression analysis). Knowledge of some computer
> programming
> >> and the R statistical system will be very helpful but not required.
> >>>
> >>> Fee: $420
> >>>
> >>> To register, click here.<
> >>
> https://apps.research.unc.edu/events/index.cfm?event=events.eventDetails&event_key=51FE2C8EA7C615597B4111E3B07B274D8C2578E5
> >>>
> >>>
> >>> Odum Institute
> >>> The University of North Carolina at Chapel Hill
> >>> Manning Hall CB# 3355
> >>> Chapel Hill, NC 27599-3355
> >>> www.odum.unc.edu<http://www.odum.unc.edu>
> >>> Telephone: 919.962.3061
> >>> Fax: 919.962.4777
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> **********************************************************
> >>>            Political Methodology E-Mail List
> >>>  Editors: Ethan Porter        <[log in to unmask]>
> >>>           Gregory Whitfield   <[log in to unmask]>
> >>> **********************************************************
> >>>       Send messages to [log in to unmask]
> >>> To join the list, cancel your subscription, or modify
> >>>          your subscription settings visit:
> >>>
> >>>         http://polmeth.wustl.edu/polmeth.php
> >>>
> >>> **********************************************************
> >>
> >> **********************************************************
> >>             Political Methodology E-Mail List
> >>   Editors: Ethan Porter        <[log in to unmask]>
> >>            Gregory Whitfield   <[log in to unmask]>
> >> **********************************************************
> >>        Send messages to [log in to unmask]
> >>  To join the list, cancel your subscription, or modify
> >>           your subscription settings visit:
> >>
> >>          http://polmeth.wustl.edu/polmeth.php
> >>
> >> **********************************************************
> >>
> >
> >
> >
> > --
> > Patrick Lam
> > Department of Government and Institute for Quantitative Social Science,
> > Harvard University
> > http://www.patricklam.org
> >
> > **********************************************************
> >             Political Methodology E-Mail List
> >   Editors: Ethan Porter        <[log in to unmask]>
> >            Gregory Whitfield   <[log in to unmask]>
> > **********************************************************
> >        Send messages to [log in to unmask]
> >  To join the list, cancel your subscription, or modify
> >           your subscription settings visit:
> >
> >          http://polmeth.wustl.edu/polmeth.php
> >
> > **********************************************************
>
> **********************************************************
>              Political Methodology E-Mail List
>    Editors: Ethan Porter        <[log in to unmask]>
>             Gregory Whitfield   <[log in to unmask]>
> **********************************************************
>         Send messages to [log in to unmask]
>   To join the list, cancel your subscription, or modify
>            your subscription settings visit:
>
>           http://polmeth.wustl.edu/polmeth.php
>
> **********************************************************
>



-- 
Manoel Galdino
https://sites.google.com/site/galdinomcz/

**********************************************************
             Political Methodology E-Mail List
   Editors: Ethan Porter        <[log in to unmask]>
            Gregory Whitfield   <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

**********************************************************

ATOM RSS1 RSS2