Challenges Scaling Open Source R – Part 2

I recently did some research on the requirements for enterprise-scale analytics and the challenges of using open source R in this context. In my first post (Requirements for Enterprise Scale Analytics with R – Part 1) I outline some of the requirements I see for enterprise scale analytics. In this second post I will discuss the challenges of R in this context.

Examining survey data about R uses, such as the Rexer Analytics Survey, and talking to those working with open source R a number of challenges become apparent in the context of enterprise scale:

  • Complex data integration.
    The explosion in new data sources makes scaling R a challenge as each must be coded so it can be integrated into the analytic environment. Data access is, as a result, one of the top challenges for R users in the Rexer Analytics survey.
  • Scaling data understanding.
    The core open source R algorithms often lack the parallelism, scalability and performance needed to explore and understand very large datasets. For many teams this means using samples, a potentially problematic solution, and reduces the number of iterations they can manage (reducing accuracy). Scaling up is another regular complaint of open source R users.
  • Time to analyze.
    R users generally find they spend longer analyzing data than users of other tools. Enterprise scale adoption means bigger datasets, exacerbating this problem.
  • Time to deploy.
    Similarly many organizations fail to deploy their R based models quickly or even at all. Challenges with the core packages and their dependencies and a need to recode models are an increasing problem as more models are developed and must be deployed across the enterprise.
  • Industrialization.
    Open source R is fundamentally a script-centric and individual contributor oriented environment. As such it does not lend itself to industrialization.

R has a lot to offer and organizations should be considering how to make R part of their predictive analytics strategy. Organizations should plan, however, on working with a commercial vendor that has a solid plan for R in terms of providing scalable implementations of the algorithms they care about.  If you want more detail on these challenges, as well as some discussion  on how adopting Teradata Aster R might help you meet them, check out this white paper sponsored by Teradata – Enterprise Scale Analytics with R: Scaling for R with Teradata Aster.  I also recorded a webinar (Up Your R Game: Break Through R Limitations) with Bill Franks of Teradata. You can also check out the recent Teradata announcement of Teradata Aster R, read Scott Gnau’s blog about Teradata’s embrace of R and my First Look on Teradata Aster R.

Cross-posted from www.jtonedm.com.