Oct 28, 2015 1 min read Machine Learning

Coding RapidMiner in Python

I wanted to see how long it would take me to transform a customer data set (i.e. ETL and generate new attributes) and then do a simple K-nn cross validation.

Photo by Antoine Dautry / Unsplash

Back in middle school, we learned about log tables. We learned how to look them up in a table, interpolate them, and then use the result in our equations. Later on, they allowed us to use calculators, which made our lives easier and faster.

Fast forward many years to a Sunday morning this October. I was at my dining table with my laptop open, fooling with Pandas and iPython Notebooks (aka Juypter). I wanted to see how long it would take me to transform a customer data set (i.e. ETL and generate new attributes) and then do a simple K-NN cross-validation. This is a routine and fast task in RapidMiner, but I wanted to code it the hard way and see how long it would take me.

Mind you, I'm not a Python coder. I learned how to cobble together scripts when I needed them and I'm a novice at best. But with a bit of coaching from my friend, I was able to cobble together this routine process in about 3 hours. I did have a few hiccups though. I had to alter my thought process when I was using Pandas/Scikit-Learn but I persevered.

Granted, a seasoned Python coder could do this in about 30 minutes, but it was a big accomplishment for me.

This little exercise did teach me a few things. It taught me that Pandas and Scikit-learn aren't hard and that I could do it. It taught me that this old dog can learn new tricks, a theory I like to confirm from time to time. It taught me that RapidMiner saves you a ridiculous amount of time in model building.

Finally, it taught me that a data scientist, with coding skills, can easily make the transition to RapidMiner. I think there is a bigger benefit to going from a coding environment to a code-free environment. Much like learning log tables first and then using a calculator.

You might also like...

The Hard Lessons I Learned Using Machine Learning to Predict the Markets

Time Series for H2O with Modeltime

How to Recognize AI Snakeoil

TensorFlow and High Level APIs

Flux Machine Learning for Julia