Hi all,
We're investigating whether we need to start training our data analysts on R, in addition to SAS, SUDAAN,Stata, SPSS and the other programs we use around here. Given how happy the R users I've spoken with are, I was surprised when a colleague identified the following potential problems with R. Does anyone have thoughts about these criticisms, or if there are simple workarounds that an introductory glimpse wouldn't reveal? Some of them just don't sit right with me, but I admit I'm pretty ignorant on the topic.
We do have a couple of essential standards that motivate my colleagues's concerns. All of our software: 1) Must be auditable. That is, there must be a trail or log of all programming and data manipulations, preferably with line numbers, that can be reviewed/edited/rerun at will; and 2) must be able to use very large datasets (usually tens to hundreds of thousands of cases, sometimes in the millions).
Thanks for your thoughts,
Anna Maria
Anna Maria Ortiz, Ph.D.
Senior Statistician
Applied Research and Methods
U.S. Government Accountability Office
(202) 512-2788
[log in to unmask]
>>> 1/9/2009 8:24 AM >>>
After my initial investigation, I discovered several issues. In no particular order:
--R works in memory. All data and functions must reside in memory simultaneously. Consequently, the size of analysis data sets must be small (less than a one to two hundred megabytes).
--If R runs out of memory, it halts abruptly and catastrophically leaving no explanation of the interruption or saving any completed work.
--The R work space must be saved at the end of every session. (This is similar to the requirement to save files after completion of work in almost every other software--MS Access being somewhat of an exception. However, R does not produce the warning about an unsaved file that most other software does.)
--R has a history file listing each step in the analysis, similar to a SAS/SPSS program listing, but does not have a program log, as is standard in SAS and SPSS. If the work space is not saved following the analysis, the history file cannot be recovered. (This is similar to writing a SAS program but failing to save the code.)
--R's interface with other software is somewhat problematic. R prefers either text or delimited files and has difficulties with binary or proprietary file structures. R modules are available to interactively cause SAS and SPSS to produce transport files that are directly readable into R as long as SAS and SPSS are available on the local workstation. However, directly reading MS Excel and MS Access files is not possible. The recommended solution is to cause those Microsoft products to produce csv files that can be read into R.
--Version control is problematic. A system administrator would be required to maintain the most current version of R. Staying on top of the issue requires frequent checks for upgrades by the administrator who has no knowledge when the program is updated.
**********************************************************
Political Methodology E-Mail List
Editors: Melanie Goodrich, <[log in to unmask]>
Delia Bailey, <[log in to unmask]>
**********************************************************
Send messages to [log in to unmask]
To join the list, cancel your subscription, or modify
your subscription settings visit:
http://polmeth.wustl.edu/polmeth.php
**********************************************************
|