POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Anna Maria Ortiz <[log in to unmask]>
Reply To:
Political Methodology Society <[log in to unmask]>
Date:
Fri, 9 Jan 2009 12:05:00 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (49 lines)
Hi all, 

We're investigating whether we need to start training our data analysts on R, in addition to SAS, SUDAAN,Stata, SPSS and the other programs we use around here. Given how happy the R users I've spoken with are, I was surprised when a colleague identified the following potential problems with R. Does anyone have thoughts about these criticisms, or if there are simple workarounds that an introductory glimpse wouldn't reveal? Some of them just don't sit right with me, but I admit I'm pretty ignorant on the topic. 

We do have a couple of essential standards that motivate my colleagues's concerns. All of our software: 1) Must be auditable. That is, there must be a trail or log of all programming and data manipulations, preferably with line numbers, that can be reviewed/edited/rerun at will; and 2) must be able to use very large datasets (usually tens to hundreds of thousands of cases, sometimes in the millions).  

Thanks for your thoughts,

Anna Maria



Anna Maria Ortiz, Ph.D.
Senior Statistician
Applied Research and Methods
U.S. Government Accountability Office
(202) 512-2788
[log in to unmask]



>>> 1/9/2009 8:24 AM >>>
After my initial investigation, I discovered several issues.   In no particular order:

--R works in memory.  All data and functions must reside in memory simultaneously.  Consequently, the size of analysis data sets must be small (less than a one to two hundred megabytes).  

--If R runs out of memory, it halts abruptly and catastrophically leaving no explanation of the interruption or saving any completed work.  

--The R work space must be saved at the end of every session.  (This is similar to the requirement to save files after completion of work in almost every other software--MS Access being somewhat of an exception.  However, R does not produce the warning about an unsaved file that most other software does.)

--R has a history file listing each step in the analysis, similar to a SAS/SPSS program listing, but does not have a program log, as is standard in SAS and SPSS.  If the work space is not saved following the analysis, the history file cannot be recovered.  (This is similar to writing a SAS program but failing to save the code.)

--R's interface with other software is somewhat problematic.  R prefers either text or delimited files and has difficulties with binary or proprietary file structures.  R modules are available to interactively cause SAS and SPSS to produce transport files that are directly readable into R as long as SAS and SPSS are available on the local workstation.  However, directly reading MS Excel and MS Access files is not possible.  The recommended solution is to cause those Microsoft products to produce csv files that can be read into R. 

--Version control is problematic.  A system administrator would be required to maintain the most current version of R.  Staying on top of the issue requires frequent checks for upgrades by the administrator who has no knowledge when the program is updated. 

**********************************************************
             Political Methodology E-Mail List
   Editors: Melanie Goodrich, <[log in to unmask]>
            Delia Bailey, <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

**********************************************************

ATOM RSS1 RSS2