Karl Rexer just published his summary of this year’s Rexer Data Mining survey. As always there’s lots of good information but here are my favorite takeaways:
- As in our own work on Predictive Analytics in the Cloud, the survey fond that a focus on customers and on customer experience/engagement was top of mind. CRM/Marketing is still the number one area in which data mining is applied.
- The survey has seen a steady increase in the use of text mining over recent years and the use of text mining in customer service saw big jumps this year. This too matches our experience when surveying about what data feeds predictive analytic models.
- Not a lot of use of Big Data with few data miners reporting a Big Data program and not much increase in dataset size. We did notice that experienced teams, those with positive results from predictive analytics already, were more likely to be using Big Data but I think it is telling that so few data miners are aware of Big Data programs in their organizations – probably because those Big Data programs are in IT and are essentially Hadoop programming projects.
- The rate of increase in the adoption of R (the open source statistical language) as a model development language has really taken off. Not only has it risen steadily every year since the survey first started asking about it but 70% of respondents now report using it while 24% say it is their primary tool. It is not quite as well established in corporate settings as it is in government and academia but it is clearly a strong and growing force in predictive analytics.
- Corporations are getting better at using analytics when the opportunity arises but there still plenty of work to do with most respondents giving their organization middling or poor marks for analytic sophistication.
- Too many models are not deployed or utilized with well under a quarter saying that their models are “always” deployed or utilized. Plus it still takes WAY too long to deploy models with weeks or months being a much more common response rather than hours or days.
- Only about half think that model performance tracking is handled reliably and most models are not being updated better than quarterly. Tracking performance and updating models regularly is a best practice that still seems to be widely lacking.
- The top algorithms continue to be regression, decision trees and cluster analysis. Time series, test mining and ensembles make up the second triad.
- Server processing is now almost as common as local processing with cloud computing making a significant inroad (if predictive analytics in the cloud interests you don’t forget our research)
You can get more information on the survey and request your own summary at http://www.rexeranalytics.com/Data-Miner-Survey-Results-2013.html