The Sudden Interest in Data Science PlatformsI’ve been at this startup thing for a few years now and I’ve seen a thing or two. If you read KDNuggets, you’ll stumble across the Gartner Hype Cycle. Right now Big Data is entering the trough of disillusionment. While that sounds sad, it kinda makes sense.
For years we’ve hearing how Big Data will unlock all kinds of insights in a corporation’s data. Everyone raced to stand up clusters, jam all kinds of data into them, and then stumble when extracting insight. The cluster became hard to tame, hard to use, and seemed like a big waste of money.
Of course RapidMiner Radoop came along and actually delivered on this promise but many companies decided to use a single tool to extract their insight. Maybe it was PySpark or Pig Script? Maybe something else completely. They married themselves to one or two ways of getting insight.
Now many companies are realizing they’re not just an R shop, they’re an R, Python, and Spark shop. Now they need to use all three or more tools in the Data Science toolkit to get anything done. Now they’re looking around for a platform to bring all these tools together.
Imagine their surprise when they find RapidMiner. We’ve been a Data Science platform from day 1. Ninety percent of the time you can do all your data science and model building right in the Studio platform. The rest of the time you might need some esoteric algorithm to finish your work. So, if you married yourself to one tool and that esoteric algorithm wasn’t available, you were SOL.
With RapidMiner it’s always been different. Need that Tweedie algorithm in R? Use the R Scripting extension and pull it in. Need to do some PySpark on your cluster? Put that script right inside Radoops Spark Script operator.
It’s that easy. After all, isn’t that what a real Data Science platform is supposed to do?
Originally published at community.rapidminer.com on April 20, 2017.