Neural Market Trends |||

Python, RapidMiner, and Carriage Returns

I’ve been working on some Python code for a RapidMiner process. What I want to do is simplify my Instagram Hashtag Tool and make it go faster.

Part of that work is extracting the Instagram comments for text processing. I ran into utter hell trying to export those comments into a CSV file that RapidMiner could read. It was exporting the data just fine but wrapping the comment into carriage returns. For some strange reason, RapidMiner can not read carriage returned data in a cell. It can only read the first line. Luckily with the help of some users I managed to work around and find a solution on my end. DO all the carriage return striping on my end before export.

The trick is to strip all carriage returns, spaces, tabs, etc using the regular expression s’, then replace the stripped items with a space like this in place. While this isn’t elegant, it had to be done because Instagram comment are so messy to begin with.


#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pandas as pd
import json
from pprint import pprint
import csv
import re

df = pd.read_json('')

name = df['graphql']['hashtag']['name']

hashtag_count = df['graphql']['hashtag']['edge_hashtag_to_media']['count']

count_list = []
likes = df['graphql']['hashtag']['edge_hashtag_to_media']['edges']
for i in likes:
    add = (df['graphql']['hashtag']['name'], df['graphql']['hashtag']['edge_hashtag_to_media']['count'], i['node']['id'], i['node']['edge_liked_by']['count'], i['node']['display_url'])

count_out = pd.DataFrame(count_list)
count_out.columns = ['hashtag', 'count', 'user_id', 'likes', 'image_url']

# This just exports out a CSV with the above data. Plan is to use this in a RM process

# Now comes the hard part, preparing the comments for a RM Process. 
# This is where the carriage returns killed me for hours

text_export = []
rawtext = df['graphql']['hashtag']['edge_hashtag_to_media']['edges']
for i in rawtext:
    rawtext = i['node']['edge_media_to_caption']['edges']
    for j in rawtext:
        final_text = j['node']['text']
        df['final_text'] = j['node']['text']

text_out = pd.DataFrame(text_export)

#This is the key, I had to strip everything using s and replacing it with a space via ' ' 

text_out.replace(r's',' ', regex=True, inplace=True)

text_out.columns = ['comment']

text_out.to_csv('out2.csv', line_terminator='rn')
Up next Deploying Julia with Domo This is a short by great video of Julia in production with Domo.   JuliaCon 2018 Julia with Domo I really like Julia. A RapidMiner Text Mining Resources Just some Text Mining resources in RapidMiner that I found cool, helpful, and interesting. This list will be updated as I find more links. Using
Latest posts Democratising Machine learning with H2O — Towards Data Science Getting started with Python datatable | Kaggle Phone Addiction Version 12 Launches Today! Machine Learning Making Pesto Tastier 5 Dangerous Things You Should Let Your Kids Do The Pyschology of Writing TensorFlow and High Level APIs Driving Marketing Performance with H2O Driverless AI Machine Learning and Data Munging in H2O Driverless AI with datatable Making AI Happen Without Getting Fired Latest Musings from a Traveling Sales Engineer The Night before H2O World 2019 Why Forex Trading is Frustrating Functional Programming in Python Automatic Feature Engineering with Driverless AI Ray Dalio's Pure Alpha Fund What's new in Driverless AI? Latest Writings Elsewhere - December 2018 House Buying Guide for Millennials Changing Pinboard Tags with Python Automate Feed Extraction and Posting it to Twitter Flux: A Machine Learning Framework for Julia Getting Started in Data Science Part 2 Makers vs Takers How Passive Investing Saved My Life Startups and Open Source The Process of Writing H2O AI World 2018 in London Ray Dalio's Pure Alpha Fund Isolation Forests in