Python, RapidMiner, and Carriage Returns

rapidminer, d3js

I’ve been working on some Python code for a RapidMiner process. What I want to do is simplify my Instagram Hashtag Tool and make it go faster.

Part of that work is extracting the Instagram comments for text processing. I ran into utter hell trying to export those comments into a CSV file that RapidMiner could read. It was exporting the data just fine but wrapping the comment into carriage returns. For some strange reason, RapidMiner can not read carriage returned data in a cell. It can only read the first line. Luckily with the help of some users I managed to work around and find a solution on my end. DO all the carriage return striping on my end before export.

The trick is to strip all carriage returns, spaces, tabs, etc using the regular expression ‘s’, then replace the stripped items with a space like this ‘ ‘ in place. While this isn’t elegant, it had to be done because Instagram comment are so messy to begin with.

Code