Spark vs Hadoop

There’s a lot of hullabaloo about Spark vs Hadoop for Big Data these days. If you’re rushing to stand up a Big Data cluster, you probably heard about this new Spark technology. The simplest way to think about the differences is that Hadoop is for batch jobs and Spark can do batch and stream processing. However, the biggest promise of Spark is the ability to code in Scala, Python (PySpark), and soon R (SparkR).

Dynamic programming languages like Python have opened up new ways to program, letting you develop algorithms interactively non-stop instead of the write/compile/test/debug cycle of C, not to mention chasing the inevitable memory management bugs. (Smart Data Collective)

While I don’t see Spark supplanting Hadoop – both rely on the HDFS data storage system – I see the leveraging of Spark to make that Hadoop elephant dance on a pin head.

As Mr. Schmitz so eloquently pointed out in the comments, Hadoop and Spark can’t supplant the other, they coexist together. What I mean to say in my last paragraph is that Spark will really let you leverage your Hadoop environment!

A Botched “R” plugin installation in Rapidminer – Solution

I decided to install the R” plugin in Rapidminer recently and seriously botched the process. I botched it so bad that Rapidminer got stuck in an installation loop that would ask me to Exit – Restart Rapidminer” continuously. I couldn’t get Rapidminer to load and I was stuck. So what’s the course of action if something like this happens to you

First and foremost, go to the experts. I went to the Rapid-I forums and searched for R plugin” and in about 10 seconds found the answer to what I was looking for. Following Sebastian’s answer to a poster’s similar problem, I found my extensions.xml file and edited it. Then I restarted Rapidminer and all was well again the land of data analytics.

So if this happens to you, just search for the extensions.xml file in your .RapidMiner5 directory and delete out offending plugin.

R and Rapidminer Together = Disruptive Technology!

I’ve been teaching myself R now that I finally got Rapidminer’s R plugin to work.  It’s pretty slick program and easy to learn, I’ve picked up so many things quickly.  I extensively use the PerformanceAnalytics, Quantmod, and tseries packages for R and on top of that, I started to recreate A Physicist on Wall Street’s awesome Rapidminer + R Example for Trading tutorial. So far so good.

It’s fantastic that I can now download stock quotes, using the R plugin, right into RapidMiner and then model those time series.  Yes the native R software has a few learning algorithms, but they in no way match RapidMiner’s breadth and depth.  That, and with RapidMiner’s ability to handle large datasets efficiently, and R’s statistical analytic and graphing powers, makes the RapidMiner and R combination a disruptive technology in my book.

Download it today, play with it, it will make your data shine in ways you can only dream of.