Statisticians and programming; languages and GUIs
There are two main kinds of statistician in the world (perhaps this is generalisable to computer users), those that have learned to program computer languages from the word go, and those who prefer not to use programming but point and click their way to a solution.
I read this blog entry with interest and it seems to have kicked up a stir (The Next Big Thing by AnnMaria). Her argument is that she favours SAS since it is robust, provides end to end solutions for data management, analysis and reporting and provides a GUI that allows users to find their way to a solution without learning to program lots of tricky code. BTW, SAS also provides code from this point and click environment to the user thus getting around any 21CFR Part 11 trickiness*. In the blog AnnMaria dismisses R as "The Next Big Thing" in stats software since it doesn't have a point and click GUI and forces users to write code from the outset. R geeks have leapt on this in the way that only rabid advocates do, getting all hot under the collar and ripping into the blog author (like that will help anything).
Let me give a little background to my thinking on this. Within the Pharma industry SAS is BIG. No, REALLY BIG. Why? Because it provides end-to-end data management, analysis and reporting for an environment where we're heavily regulated and we need to be able to show how we got from dataset A, cleaned it up (merging treatment data to outcome data, creating the analysis variable from the longitudinal measures of response, dealing with incomplete data etc.), analysed it and then produced many, many tables and graphs of that data. The weight of the SAS company behind the product means that it has clout in support, validation etc. which are all critical for our process. When I joined the industry in 1993 it was in an era when statisticians did the programming - we prepared data, analysed it, programmed the tables and figures ourselves. When I started we were working on VMS/VAX workstations that had rudimentary GUI interfaces (it was 2 years before we started using MS Windows). There was little in the way of GUI interface to SAS at that time and I spent the first 3 or so years of my career learning to program it. And I became reasonably good at programming it. To the point of writing a nice little function using SAS to call WinBUGS.
I have realised over my career though that there are those that like to get "down and dirty" with programming and look at the guts of code, play with it, tweak it, enhance it and take great joy in sharing their tweaks with others. There are a second group who just like using it. "Don't show me the code. Show me the results." This second crowd rely on other people getting the code right in the first place. They are the end-users.
Over the last 8 years though I've started to use R more and more. Now, I'm nowhere near an expert level at R and I'm not looking to push the envelope in programming R to do "the next big thing" but I'm a happy user and can get R to do most things. I like using R. It's free and open (as in beer and speech) and so you get to see HOW programmers have implemented stuff (if you need to) and it's free to use on whatever platform you like. It does have a STEEP learning curve especially if you're coming from SAS - it's object oriented rather than data record oriented - and it'll take a while to get your head around it if you're not used to it, but it WORKS. I've also done some cool things using R, like a toolkit for specifying and running simulations for clinical trial design and evaluation. However R struggles to find a foothold for routine statistical analysis and reporting because EACH INDIVIDUAL USER needs to be responsible for good practice around downloading R, packages etc. Because you CAN download and install R on any system without special admin privileges and because you CAN go to CRAN and download any old stat analysis package you like doesn't mean you SHOULD.
Industry likes SAS because it's a controlled environment. Unlike R. Which is the very reason why I think R users like R...
* The fact that you CAN'T access the code behind Excel spreadsheet calculations is exactly WHY Excel is useless for industry under 21CFR11. You can't recreate somebody else's results using a fresh set of data using Excel unless you know EXACTLY what they did using point and click. Aye, there's the rub.